AlphaFold: Artificial Intelligence in Protein Structure Prediction

Written By: Mark Titleman

In recent years, particularly after the COVID pandemic, Nobel Prizes have been awarded for

either experimental results or theoretical advances of immediate or eventual utility. Generally it is the

most impactful technologies and solutions – often developed over a generation – that receive notice,

funding, and eventually come to fruition as significant scientific breakthroughs, uniformly accompanied

and sometimes dragged to the finish line by a dogged researcher or two. They rarely sprout from a

simple theoretical insight or “light bulb” moment. The technologies associated with artificial intelligence

have been developing for over 30 years. Hastening advances in computing, as well as eventually AI

applications and appreciation, have finally brought some of these researchers an outpouring of Nobel

recognition. The 2024 Nobel Prize in Chemistry was awarded in part to John Jumper and Demis Hassabis

for the AI deep learning system AlphaFold that predicts protein structure. The AlphaFold 2 paper

accompanied by its database and software has been cited 27 thousand times since its publication in

2021, and AlphaFold 3 can predict the structure of larger protein complexes – both suites doing so with

incredible accuracy. This is a groundbreaking achievement that will greatly expedite work in

biochemistry. It took many years to come to fruition and its functioning and potential impact should be

explored.

Protein tertiary structure is the overall folding of a protein determined in part by the protein

primary structure or amino acid sequence. It is composed of local folds in a protein – the secondary

structure – which form rapidly due to intramolecular bonding: alpha helices, alpha turns, beta sheets,

beta turns, and other small folds. The process of ternary structure formation which arises from

hydrophilicity and hydrophobicity is exquisite in its precision and functional specificity. Digestive

enzymes, for example, are globular proteins with a hydrophobic core of which there are many types and

whose function is determined by secondary and tertiary structure.

Databases of these structures are traditionally built very slowly as proteins are identified

experimentally through x-ray crystallography, electron microscopy and nuclear magnetic resonance.

Computational methods have been used to arrive at protein tertiary structure predictions for about as

long as artificial intelligence has been under development, but these predictions fair poorly at predicting

the outcome of experiment. The Critical Assessment of Structure Prediction (CASP) is a biannual

competition for comparing predicted protein tertiary structures to experimentally obtained structures

via a global distance test (GDT) of relative alpha carbon positions. AlphaFold, which uses an AI deep

learning system, was the first such system to win the CASP in 2018, and by 2022 its latest open-source

version or variants thereof were used by virtually all CASP competitors.

The form of deep learning used by AlphaFold is a system of learning whose components are

familiar to many people. Studies have shown that the human brain switches between central executive

function and more peripheral functioning in daily tasks for improved memory and intellectual

performance. This can occur for the process of learning and memory formation itself, although this is

reliant on structured sequences subject to change. In the context of AI this is known as the fixed-width

sequence, and it is composed of malleable or “soft” tokens for creating and affirming the sequence

going forward and “hard” tokens computed during the backwards pass. Of course, no human learning

could be sequence-reliant to such an extent! Attention machine learning, however, is indeed all about

sequence building, zooming in and out of data to make sense of it.

As opposed to the multi-head attention mechanism of transformer architecture and its use of

vectors for achieving “self-attention,” the general deep learning technique of attention networks forms

networks of sequences to identify the changeable categories endemic to a large problem. This type of

computation was well suited for predicting protein folding and tertiary structure. Transformer design

was ultimately implemented in two training modules of the revolutionary AlphaFold 2 for comparing

amino acid residues and positions within sequences. Iterated, these two modules feed into each other

for predicting protein tertiary structure.

These efforts culminated with the AlphaFold Protein Structure Database which launched on July

22, 2021, containing predicted structures of nearly all human proteins – some 365 000 proteins. On July

28, 2022, the database was updated with just about every known protein on the planet. This important

technology and database can be used for advanced drug discovery, and has already been put to

invaluable use in predicting the protein structures of SARS-CoV-2 during the COVID pandemic.

It is no surprise then that John Jumper and Demis Hassabis, whose research greatly expedited

the prediction of protein tertiary structure from primary structure, were awarded the 2024 Nobel Prize

in Chemistry. The work was the result of decades of artificial intelligence research and well-chosen AI

techniques appropriately modified for the large problem of protein structure prediction. Such a task

required monumental understanding of proteomics and artificial intelligence, as well as the

organizational know-how accompanying these endeavors, and such huge achievements and their

positive implications stand out amongst already impressive Nobel Prize winners as AI-based prizes

become the rule rather than the exception.

Bibliography

Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger,

Kathryn Tunyasuvunakool et al. "Highly accurate protein structure prediction with AlphaFold." nature 596, no. 7873 (2021): 583-589.

Kovalevskiy, Oleg, Juan Mateos-Garcia, and Kathryn Tunyasuvunakool. "AlphaFold two years on:

Validation and impact." Proceedings of the National Academy of Sciences 121, no. 34 (2024):

e2315002121.

Marcu, Ştefan-Bogdan, Sabin Tăbîrcă, and Mark Tangney. "An overview of Alphafold's

breakthrough." Frontiers in artificial intelligence 5 (2022): 875587.

Skolnick, Jeffrey, Mu Gao, Hongyi Zhou, and Suresh Singh. "AlphaFold 2: why it works and its

implications for understanding the relationships of protein sequence, structure, and function." Journal

of chemical information and modeling 61, no. 10 (2021): 4827-4831.

Toews, R. (2021). AlphaFold Is The Most Important Achievement In AI—Ever. [online] Forbes.

Available at: https://www.forbes.com/sites/robtoews/2021/10/03/alphafold-is-the-most-important-

achievement-in-ai-ever/.

Previous
Previous

Artificial Intelligence in Drug Discovery

Next
Next

Artificial Intelligence in Genome Assembly