AlphaFold: Artificial Intelligence in Protein Structure Prediction
Written By: Mark Titleman
In recent years, particularly after the COVID pandemic, Nobel Prizes have been awarded for
either experimental results or theoretical advances of immediate or eventual utility. Generally it is the
most impactful technologies and solutions – often developed over a generation – that receive notice,
funding, and eventually come to fruition as significant scientific breakthroughs, uniformly accompanied
and sometimes dragged to the finish line by a dogged researcher or two. They rarely sprout from a
simple theoretical insight or “light bulb” moment. The technologies associated with artificial intelligence
have been developing for over 30 years. Hastening advances in computing, as well as eventually AI
applications and appreciation, have finally brought some of these researchers an outpouring of Nobel
recognition. The 2024 Nobel Prize in Chemistry was awarded in part to John Jumper and Demis Hassabis
for the AI deep learning system AlphaFold that predicts protein structure. The AlphaFold 2 paper
accompanied by its database and software has been cited 27 thousand times since its publication in
2021, and AlphaFold 3 can predict the structure of larger protein complexes – both suites doing so with
incredible accuracy. This is a groundbreaking achievement that will greatly expedite work in
biochemistry. It took many years to come to fruition and its functioning and potential impact should be
explored.
Protein tertiary structure is the overall folding of a protein determined in part by the protein
primary structure or amino acid sequence. It is composed of local folds in a protein – the secondary
structure – which form rapidly due to intramolecular bonding: alpha helices, alpha turns, beta sheets,
beta turns, and other small folds. The process of ternary structure formation which arises from
hydrophilicity and hydrophobicity is exquisite in its precision and functional specificity. Digestive
enzymes, for example, are globular proteins with a hydrophobic core of which there are many types and
whose function is determined by secondary and tertiary structure.
Databases of these structures are traditionally built very slowly as proteins are identified
experimentally through x-ray crystallography, electron microscopy and nuclear magnetic resonance.
Computational methods have been used to arrive at protein tertiary structure predictions for about as
long as artificial intelligence has been under development, but these predictions fair poorly at predicting
the outcome of experiment. The Critical Assessment of Structure Prediction (CASP) is a biannual
competition for comparing predicted protein tertiary structures to experimentally obtained structures
via a global distance test (GDT) of relative alpha carbon positions. AlphaFold, which uses an AI deep
learning system, was the first such system to win the CASP in 2018, and by 2022 its latest open-source
version or variants thereof were used by virtually all CASP competitors.
The form of deep learning used by AlphaFold is a system of learning whose components are
familiar to many people. Studies have shown that the human brain switches between central executive
function and more peripheral functioning in daily tasks for improved memory and intellectual
performance. This can occur for the process of learning and memory formation itself, although this is
reliant on structured sequences subject to change. In the context of AI this is known as the fixed-width
sequence, and it is composed of malleable or “soft” tokens for creating and affirming the sequence
going forward and “hard” tokens computed during the backwards pass. Of course, no human learning
could be sequence-reliant to such an extent! Attention machine learning, however, is indeed all about
sequence building, zooming in and out of data to make sense of it.
As opposed to the multi-head attention mechanism of transformer architecture and its use of
vectors for achieving “self-attention,” the general deep learning technique of attention networks forms
networks of sequences to identify the changeable categories endemic to a large problem. This type of
computation was well suited for predicting protein folding and tertiary structure. Transformer design
was ultimately implemented in two training modules of the revolutionary AlphaFold 2 for comparing
amino acid residues and positions within sequences. Iterated, these two modules feed into each other
for predicting protein tertiary structure.
These efforts culminated with the AlphaFold Protein Structure Database which launched on July
22, 2021, containing predicted structures of nearly all human proteins – some 365 000 proteins. On July
28, 2022, the database was updated with just about every known protein on the planet. This important
technology and database can be used for advanced drug discovery, and has already been put to
invaluable use in predicting the protein structures of SARS-CoV-2 during the COVID pandemic.
It is no surprise then that John Jumper and Demis Hassabis, whose research greatly expedited
the prediction of protein tertiary structure from primary structure, were awarded the 2024 Nobel Prize
in Chemistry. The work was the result of decades of artificial intelligence research and well-chosen AI
techniques appropriately modified for the large problem of protein structure prediction. Such a task
required monumental understanding of proteomics and artificial intelligence, as well as the
organizational know-how accompanying these endeavors, and such huge achievements and their
positive implications stand out amongst already impressive Nobel Prize winners as AI-based prizes
become the rule rather than the exception.
Bibliography
Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger,
Kathryn Tunyasuvunakool et al. "Highly accurate protein structure prediction with AlphaFold." nature 596, no. 7873 (2021): 583-589.
Kovalevskiy, Oleg, Juan Mateos-Garcia, and Kathryn Tunyasuvunakool. "AlphaFold two years on:
Validation and impact." Proceedings of the National Academy of Sciences 121, no. 34 (2024):
e2315002121.
Marcu, Ştefan-Bogdan, Sabin Tăbîrcă, and Mark Tangney. "An overview of Alphafold's
breakthrough." Frontiers in artificial intelligence 5 (2022): 875587.
Skolnick, Jeffrey, Mu Gao, Hongyi Zhou, and Suresh Singh. "AlphaFold 2: why it works and its
implications for understanding the relationships of protein sequence, structure, and function." Journal
of chemical information and modeling 61, no. 10 (2021): 4827-4831.
Toews, R. (2021). AlphaFold Is The Most Important Achievement In AI—Ever. [online] Forbes.
Available at: https://www.forbes.com/sites/robtoews/2021/10/03/alphafold-is-the-most-important-
achievement-in-ai-ever/.