Sometimes, a few subtle clues are enough to permit the assembly of a complete, solid picture. Alas, such times are rare—so rare, in fact, that if a Sherlock Holmes comes along and manages the trick just a few times, they win unending acclaim. But what if the trick could be pulled off on a regular basis? What if the trick weren’t limited to isolated, narrow “cases,” but applicable to wide-ranging cases? These questions are being entertained by structural biologists, who typically have to make do with scant data when they try to guess how proteins and RNA molecules might fold into 3D structures.
A year ago, structural biologists based at Stanford University published an account that could have been called The Case of the Protein Structures. (This account, which was actually titled, “Hierarchical, rotation-equivariant neural networks to select structural models of protein complexes,” appeared in Proteins.) More recently, on August 27, they published an account that could be called The Case of the RNA Structures. (This account, which is actually titled, “Geometric deep learning of RNA structure,” appeared in Science.)
The first study was led by Ron O. Dror, PhD, associate professor of computer science, and the second study was co-led by Dror and Rhiju Das, PhD, associate professor of biochemistry. Assisting in both studies—and presumably doing so well above the Watson level—were Stanford University PhD students Stephan Eismann and Raphael Townshend. Both studies demonstrated that 3D structures of can be predicted by the ultimate Sherlock, artificial intelligence (AI).
Most notably, the researchers have shown that their AI approach succeeds even when it must learn from only a few known structures. The researchers hope that their approach will help scientists to explain how different molecules work, with applications ranging from fundamental biological research to informed drug design practices.
“Proteins are molecular machines that perform all sorts of functions,” Eismann said. “To execute their functions, proteins often bind to other proteins. If you know that a pair of proteins is implicated in a disease and you know how they interact in 3D, you can try to target this interaction very specifically with a drug.”
Instead of specifying what makes a structural prediction more or less accurate, the researchers let the algorithm discover these molecular features for itself. They did this because they found that the conventional technique of providing such knowledge can sway an algorithm in favor of certain features, thus preventing it from finding other informative features.
“The problem with these hand-crafted features in an algorithm is that the algorithm becomes biased toward what the person who picks these features thinks is important,” Eismann noted. “You might miss some information that you would need to do better.”
“The network learned to find fundamental concepts that are key to molecular structure formation, but without explicitly being told to,” Townshend added. “The exciting aspect is that the algorithm has clearly recovered things that we knew were important, but it has also recovered characteristics that we didn’t know about before.”
Having shown success with proteins, the researchers turned their attention to RNA molecules. The researchers tested their algorithm in a series of “RNA Puzzles” from a longstanding competition in their field, and in every case, the tool outperformed all the other puzzle participants and did so without being designed specifically for RNA structures.
“We introduce a machine learning approach that enables identification of accurate structural models without assumptions about their defining characteristics, despite being trained with only 18 known RNA structures,” the authors of the Science article wrote. “The resulting scoring function, the Atomic Rotationally Equivariant Scorer (ARES), substantially outperforms previous methods and consistently produces the best results in community-wide blind RNA structure prediction challenges.”
The researchers asserted that their approach overcomes a major limitation of standard deep neural networks because it can learn effectively even from a small amount of data. “[Our approach] uses only atomic coordinates as inputs and incorporates no RNA-specific information,” the researchers elaborated. “[It] is applicable to diverse problems in structural biology, chemistry, materials science, and beyond.”
“Most of the dramatic recent advances in machine learning have required a tremendous amount of data for training,” Dror pointed out. “The fact that this method succeeds given very little training data suggests that related methods could address unsolved problems in many fields where data is scarce.”
“ARES’s ability to outperform the previous state of the art despite using only a small number of structures for training suggests that similar neural networks could lead to substantial advances in other areas involving 3D molecular structure, where data are often limited and expensive to collect,” the authors of the Science article concluded. “In addition to structure prediction, examples might include molecular design (both for macromolecules such as proteins or nucleic acids and for small-molecule drugs), estimation of electromagnetic properties of nanoparticle semiconductors, and prediction of mechanical properties of alloys and other materials.”
An assessment of ARES was offered by Kevin Weeks, PhD, a chemistry professor at the University of North Carolina, Chapel Hill. In a Perspective article (“Piercing the fog of the RNA structure-ome”) in Science, Weeks wrote, “ARES is still short of the level consistent with atomic resolution or sufficient to guide identification of key functional sites or drug discovery efforts, but Townshend et al. have achieved notable progress in a field that has proven recalcitrant to transformative advances.”