Nov 28, 2022
Machine-Learning Model Reveals Protein-Folding Physics
Posted by Saúl Morales Rodriguéz in categories: biological, information science, physics, robotics/AI
An algorithm that already predicts how proteins fold might also shed light on the physical principles that dictate this folding.
Proteins control every cell-level aspect of life, from immunity to brain activity. They are encoded by long sequences of compounds called amino acids that fold into large, complex 3D structures. Computational algorithms can model the physical amino-acid interactions that drive this folding [1]. But determining the resulting protein structures has remained challenging. In a recent breakthrough, a machine-learning model called AlphaFold [2] predicted the 3D structure of proteins from their amino-acid sequences. Now James Roney and Sergey Ovchinnikov of Harvard University have shown that AlphaFold has learned how to predict protein folding in a way that reflects the underlying physical amino-acid interactions [3]. This finding suggests that machine learning could guide the understanding of physical processes too complex to be accurately modeled from first principles.
Predicting the 3D structure of a specific protein is difficult because of the sheer number of ways in which the amino-acid sequence could fold. AlphaFold can start its computational search for the likely structure from a template (a known structure for similar proteins). Alternatively, and more commonly, AlphaFold can use information about the biological evolution of amino-acid sequences in the same protein family (proteins with similar functions that likely have comparable folds). This information is helpful because consistent correlated evolutionary changes in pairs of amino acids can indicate that these amino acids directly interact, even though they may be far in sequence from each other [4, 5]. Such information can be extracted from the multiple sequence alignments (MSAs) of protein families, determined from, for example, evolutionary variations of sequences across different biological species.