The protein structure problem: solving life’s origami
October 11, 2025
### Imma Perfetto
Cosmos science journalist
By Imma Perfetto
### The artificial intelligence program AlphaFold is proving to be a gamechanger for biological research, Imma Perfetto reports. This article was originally published in the Cosmos Print Magazine, September 2024.
This artwork of an origami bird holds AlphaFold 3 predictions of a complex of two proteins (ScpA and ScpB) in its beak. The protein complex is important during cell division in bacteria. Top: ScpA is cyan and ScpB is green. Bottom: Confidence measures, where dark blue is very high confidence, light blue is confident, yellow is low confidence, and orange is very low confidence in the structural prediction. Credit: AlphaFold 3, Katie Michie.
A protein is made from of a chain of amino acids strung together like beads on a necklace. This chain spontaneously folds, like origami, into intricate pleats, folds, and loops through interactions between its amino acids. The resulting unique 3D structure largely determines its vital function within the lifeform. Solving the structure allows biologists to better understand how the protein works and design experiments to affect and modify it.
The smallest known protein, TAL, influences development of the fruit fly Drosophila melanogaster and has just 11 amino acids. The largest, Titin, is found in human muscle cells and is made up of roughly 35,000.
Proteins are far too tiny to inspect under a regular microscope. For decades researchers used complex experimental techniques, such as X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, and cryogenic electron microscopy (cryo-EM) to solve their structures. It’s painstaking, time-consuming work that takes specialised skill and sometimes hundreds of thousands of dollars. And, as Kate Michie can attest, success is not always guaranteed.
“I spent four years trying to solve the crystal structure of a complex of two human proteins and got scooped. You know, I got nothing out of four years. I worked really hard at it, and it was a really difficult project. AlphaFold can calculate those in a few hours,” says Michie, who is chief scientist of the Structural Biology Facility at the Mark Wainwright Analytical Centre, of the University of New South Wales Sydney.
On 8 May 2024 _Nature_ dropped a paper introducing the third and latest iteration of the artificial intelligence (AI) system AlphaFold, which predicts the 3D structure of proteins from their amino acid sequences. Google DeepMind and Isomorphic Labs, both subsidiaries of Alphabet, co-developed the new model. They say AlphaFold 3 (AF3) is “a revolutionary model that can predict the structure and interactions of all life’s molecules with unprecedented accuracy”. But, while AF3 has generated significant interest since its release, it has simultaneously sparked criticism among those in the scientific community.
Let’s take a closer look at how AI is changing the world of structural biology.
## A revolution in protein structure
AF3’s predecessor, AlphaFold 2, was released as open source code in July 2021 and immediately changed the game in structural biology.
“I contacted the high-performance computation people and said, ‘we really need to get this piece of code running’. And then I asked my colleague, ‘Do you have any structures that you never submitted to the Protein Data Bank?’” says Michie.
The Protein Data Bank (PDB) is the global archive of all the experimentally solved structures for large biological molecules. As of June 2024, its estimated to include more than 220,000 proteins, which sounds like a lot until you consider the number of proteins we know of exceeds 200 million.
“My colleague sent me a sequence of a small protein he never submitted to the PDB, I ran it, and I just sent him the result. His email response to me was: ‘My mind is blown!’ And he said, ‘I immediately thought someone else must have solved the structure.’”
But they hadn’t, AF2 had accurately predicted the 3D structure of the protein from its amino acid sequence alone. What had taken years to describe experimentally had been done in just a few hours.
AF2 is a deep learning algorithm. In the world of AI that means it simulates the neural networks found in human brains. First, it takes the protein sequence of interest and searches several databases for similar proteins. By comparing these sequences, it can identify areas of similarity and difference to understand how the protein has changed across evolution.
For instance, if two amino acids are in close contact in 3D space then a mutation in one will usually be accompanied by a mutation in the other (to conserve the structure of the protein). But if they are far apart then they tend to evolve independently from each other. Using this to work out the relative positions of the amino acids, AF2 then takes its training on PDB structural data and iteratively constructs a 3D model of the protein’s structure with relatively high accuracy.
Scientists can take advantage of that predicted structure to accelerate their science by doing smarter, more strategic experiments in the laboratory right off the bat. “I’ve done work with some scientists working with immune complexes, and the models coming out of AlphaFold enable them to really trim down the number of animal experiments they do,” says Michie. “So instead of making say 20 CRISPR mice, they only might make two.”
As seen in AlphaFold 3, a structural prediction of Fos and Jun transcription factors with the DNA sequence they bind. The top panel shows the model and confidence data, and the green chart shows the high confidence of them binding to each other. Credit: AlphaFold 3, Katie Michie.
## Crystal clues
An accurate AlphaFold structure can also be the crucial missing piece of the puzzle that allows researchers to experimentally solve the structure using X-ray crystallography.
“One of my other colleagues is virologist and he’d been working on a protein that had eluded structural elucidation for 20–30 years. It was from the world’s first known retrovirus,” says Michie.
“The trick of crystallography is you need to know two components of the maths to solve them,” she continues. The diffraction data provided by X-ray crystallography gives you one of those components, but you don’t have the other: the phase.
Traditional methods of obtaining phase information had proved unsuccessful, until Michie suggested using AlphaFold instead.
“Immediately the structure came out. AlphaFold helped him get the crystals but then actually enabled him to phase the structure. It told us that the Alpha Fold model was very good, but it also fixed up this problem in structural biology.”
To Michie, AlphaFold represents a massive step forward: “it’s genuinely the biggest scientific advance in my career”.
> “The Alpha Fold model was very good, but it also fixed up this problem in structural biology.”
## Predicting the structures of life’s molecules
Proteins don’t exist in a vacuum. They move around, bind to and modify each other, and even form large, complicated complexes.
Peter Czabotar, joint head of the Structural Biology Division at WEHI, the oldest medical research institute in Australia, says one of the early limitations of AF2 was you could only ever get structural predictions of one protein, alone. “Often what you’re interested in is how different proteins will interact with each other. For example, we work on proteins that are involved with cell death and the interactions between those proteins will dictate whether a cell will live or die.”
The gap has since been bridged by other research groups adapting and building upon AF2’s open source code, and with the AlphaFold-Multimer extension in October 2021.
The newest version, AF3, extends upon this capability by predicting interactions of multiple proteins, and nucleic acids (DNA and RNA). It can predict the impact of ions and post-translational modifications – the addition of chemical groups to amino acids – on these molecular systems too. AF3 can also be used to predict how a selection of small molecules called ligands bind to proteins, though this is restricted to ligands that have high-quality experimental data available in the PDB.
“But where the real power is, something that we do a lot of, is in the drug discovery world,” says Czabotar. “And it is extremely powerful for that, potentially, but they haven’t enabled that in the way that it’s released. We’ve done drug discovery against cell death proteins, for example. I can’t take one of the drugs that we’ve worked with and see how it interacts with my target protein, I can only use the [ligands] that they’ve enabled us to use.”
That capability to predict the structure of novel drug molecules interacting with target proteins seems to be restricted to Isomorphic Labs, which was launched in 2021 to pursue commercial drug discovery.
AF3 uses a very different approach for this new suit of predictions: generative AI. After processing the sequence inputs, it assembles its predictions using a diffusion network, the likes of which power AI image generators. According to Isomorphic Labs’ website: “the diffusion process starts with a cloud of atoms, and over many steps converges on its final, most accurate molecular structure”. Diffusion has been applied to protein structure prediction before, for example, in the seminal RoseTTAFold diffusion (RFdiffusion) by the Baker Laboratory at the Institute for Protein Design, the University of Washington.
But generative AI is not without its limitations. AF3 will occasionally produce structures with overlapping atoms (this is physically impossible) or replace a detail of the structure with its mirror image (chemically impossible). As a generative model, it is also prone to hallucinations in which it invents plausible-looking structures – particularly in disordered regions of the protein that lack a stable 3D structure – similarly to how a text to image AI struggles to create realistic-looking hands. In-built confidence measures help to identify when AF3 isn’t so sure about its structural prediction, but ultimately it takes a scientist with understanding of the underlying structural biology to come along and identify what’s gone wrong, and why.
“It’s very, very powerful. But it doesn’t exclude the need to necessarily confirm things experimentally. Whether that is by solving structures themselves or by, for example, testing the structures in some way in an experiment,” says Czabotar.
## Concerns about code
In a major departure from AF2, access to the newest iteration of AlphaFold is limited to a web server and for non-commercial research only. “We have various structure-based drug discovery projects and some of them are purely academic, as students, PhDs and honours projects. But we also have had commercial partnerships, because that’s a way to push your discoveries into a clinical setting,” says Czabotar. “So generally, anything that is going to make an impact is done by an academic lab in a commercial partnership. Now, I guess it puts us in a bit of an awkward situation. Even if we could look at our compounds bound to the target [protein], there’s some projects where we won’t be able to do it because, you know, we’ve ticked a box.”
AF3’s accompanying _Nature_ paper was also published without the source code, but with a ‘pseudocode’ instead – a detailed description of what the code can do and how it works. This prompted an open letter to the Editors of _Nature_ , published 16 May and endorsed by more than 1,000 scientists as of June.
The letter raised concerns that “the absence of available code compromises peer review” and that the pseudocode released would “require months of effort to turn into workable code that approximates the performance, wasting valuable time and resources”. Access to the web server was also initially capped at 10 predictions per day, which the letter stated, “restricts the scientific community’s capacity to verify the broad claims of the findings or apply the predictions on a large scale”.
The sentiments appear to have hit home. Shortly after the letter’s release, DeepMind’s Vice President of research, Pushmeet Kohli announced via X that they would double the daily job limit to 20 and are “working on releasing the AF3 model (incl weights) for academic use … within 6 months”.
On 22 May _Nature_ responded in an editorial, stating its reasoning for publishing the paper without code: “the private sector funds most global research and development, and many of the results of such work are not published in peer-reviewed journals. We at _Nature_ think it’s important that journals engage with the private sector and work with its scientists so they can submit their research for peer review and publication.”
In the meantime, other researchers won’t be sitting idly by until the code release at the end of 2024. Already, multiple teams are racing to develop their own open source versions of AlphaFold 3, without any strings attached.
Originally published by Cosmos as The protein structure problem: solving life’s origami
* *