Charlie Harris
banner
harrisbio.bsky.social
Charlie Harris
@harrisbio.bsky.social
PhD @ Cambridge in AI for Bio | Interested in generative modelling for drug discovery and science policy 🇬🇧

Website: cch1999.github.io
Blog: harrisbio.substack.com
Database: harrisbio.notion.site
16/ What’s clear is that getting data and compute right is essential—not just for breakthroughs in science but for keeping the UK competitive globally.

Here’s hoping this plan gets the funding, leadership, and focus it needs to succeed!

Happy to chat about any of this.

end
January 13, 2025 at 7:17 PM
15/ Side note: UK universities could supercharge their AI teaching by embracing industry expertise (where the real knowledge is)

I teach a course at Cambridge led by a DeepMind researcher, and it’s the most popular in the department.
January 13, 2025 at 7:17 PM
14/ US universities do this well with CS minors, which foster computational literacy across disciplines.

The UK could adopt similar models to produce scientists who are not only domain experts but also skilled at applying AI tools to their fields.
January 13, 2025 at 7:17 PM
13/ The plan has solid ideas on AI skills, but it's not *just* about creating more "AI graduates." We need to train domain experts in the natural sciences to understand and use AI effectively.

Almost all scientists should know neural networks as well as they know Excel and stats
January 13, 2025 at 7:17 PM
12/ Another standout: the plan proposes an internal headhunting team within the UK Government to attract top global talent to AISI, the UK Sovereign AI Team, and UK-based companies.

Will they also have the power to fast-track visas? From experience, i hope so....
January 13, 2025 at 7:17 PM
11/ The UK Sovereign AI Team could be a great connector of
-Public institutions creating scientific datasets
-Industrial labs capable of training models on those datasets

This sort of collaboration could really unlock breakthroughs in science
January 13, 2025 at 7:17 PM
10/ The plan’s proposal to create a UK Sovereign AI Team is great. This unit will partner with private and academic sectors to back national champions and remove roadblocks in AI, with a strong focus on AI for science and robotics.
January 13, 2025 at 7:17 PM
(Usual reminder that AlphaFold3 was trained for 120k+ GPU hours... this is multiple times more than the whole compute budget of my lab this year)
January 13, 2025 at 7:17 PM
9/ Another question: who will the AIRR programme directors work for? UKRI? ARIA?

Will they be empowered to deploy large amount of compute into highly productive groups at the cutting edge?

There is no point in this if it means everyone only gets a few GPU hours each.
January 13, 2025 at 7:17 PM
9/ One question: will these AIRR programme directors also decide how funding is allocated for data generation?

For scientific initiatives, compute and data strategies are deeply interconnected. Ideally, the same person would oversee both to ensure alignment.
January 13, 2025 at 7:17 PM
8/ Another standout is the creation of AIRR programme directors—mission-focused individuals with autonomy to strategically allocate compute to high-potential projects.

A kind of "Compute Czar" role, this could significantly accelerate progress on big bets in AI for science.
January 13, 2025 at 7:17 PM
7/ However, there’s a risk of duplicating efforts where existing world-class institutions, like the EBI managing the PDBe, are already doing excellent work.

Not every problem needs to fit into a National Data Library-sized™ hole. Let’s build on what we already have!
January 13, 2025 at 7:17 PM
6/ People often say, "Big Pharma has lots of data!"—but much of it is unstructured and sparse, making it unsuitable for deep learning.

The plan acknowledges this challenge and recommends creating better infrastructure and incentives to make datasets AI-ready.
January 13, 2025 at 7:17 PM
5/ That’s why I’m thrilled to see the plan emphasise strategic data initiatives:
-Identifying high-impact datasets
-Improving data quality
-Incentivising researchers and companies to unlock and curate datasets

These efforts will make sparse, unstructured datasets better for AI
January 13, 2025 at 7:17 PM
4/ If we want breakthroughs beyond protein folding, we need to address data gaps across science.

AlphaFold was made possible by sustained investment in protein structure data.

Similar long term commitments are essential for other fields like materials and climate science.
January 13, 2025 at 7:17 PM
3/ AI breakthroughs like AlphaFold wouldn’t be possible without decades of work on datasets.

e.g., AlphaFold was trained on protein structures from the Protein Data Bank (PDB), which took 50+ years and ~$20 *billion* to create.

This is the kind of foundational effort AI needs.
January 13, 2025 at 7:17 PM
2/ There’s a lot to like in this plan:
- Expanding UK AI compute capacity by 20x
- Establishing AI Growth Zones
- Building up AI talent pipelines

But as a scientist, what excites me most is the report’s focus on **data**—an area we really need to get right.
January 13, 2025 at 7:17 PM