Of the thousands of diseases that affect humans, treatments exist for only a handful. This lack of available therapeutics and efficiency in drug discovery and development processes is poised for transformation with the advent of artificial intelligence (AI). AlphaFold’s phenomenal success in predicting protein structures for the entire human genome was a watershed moment for structure-based drug design.
A new study published in the journal Chemical Science (“AlphaFold accelerates artificial intelligence powered drug discovery: efficient discovery of a novel CDK20 small molecule inhibitor”) has applied AlphaFold to an end-to-end AI-powered drug discovery platform (Pharma.AI) that includes a biocomputational engine (PandaOmics) and a generative chemistry platform (Chemistry42), to identify a new drug for a novel target for the treatment of the most common form of primary liver cancer, hepatocellular carcinoma (HCC).
Of note, the AI-driven drug discovery process identified the new drug for a target that had no known crystal structure, with unprecedented efficiency. This is the first report of the application of AlphaFold to identify a confirmed hit for a novel target in early drug discovery.
Senior authors of the collaborative study include Alán Aspuru-Guzik, PhD, professor of chemistry and computer science and director of the Acceleration Consortium at the University of Toronto, Michael Levitt, PhD, Nobel laureate in Chemistry and professor of structural biology at Stanford University, and Alex Zhavoronkov, PhD, founder and CEO of Insilico Medicine.
“We decided to go after a project where AI would be used to identify a target for a disease without an existing crystal, use AlphaFold to get the crystal, use another form of generative AI to generate the molecules for this crystal, and then synthesize and test the compounds,” said Zhavoronkov. “And it worked!”
“This paper is further evidence of the capacity for AI to transform the drug discovery process with enhanced speed, efficiency, and accuracy,” said Levitt. “Bringing together the predictive power of AlphaFold and the target and drug design power of Insilico Medicine’s Pharma.AI platform, it’s possible to imagine that we’re on the cusp of a new era of AI-powered drug discovery.”
“This paper demonstrates that for healthcare, AI developments are more than the sum of their parts,” said Aspuru-Guzik. “If one uses a generative model targeting an AI-derived protein, one can substantially expand the range of diseases that we can target. If one adds self-driving labs to the mix, we will be in uncharted territory.”
“In 1969, Cyrus Levinthal despaired that with large number of degrees of freedom in an unfolded polypeptide chain, it will be intractable to sift through the molecule’s astronomical number of possible conformations as would be necessary in computational drug design. But imitating nature one can get around this paradox by focusing on ‘only selected easy instances of the hard problem’ as guided by evolution,” said Bud Mishra, PhD, professor of computer science, engineering and mathematics at New York University. “By using large molecular datasets and powerful computers, it has now become possible to engineer AI’s like Alphafold, AlphafoldDB AlphaDesign and RosettaFold, which have enabled Zhavoronkov et al, to recently design CDK20 inhibitors, purely in silico. Their work marks a milestone in computational biology, which will inspire others in taming human suffering, diseases and aging!” (Mishra was not involved in the current study).
The researchers used PandaOmics to identify the HCC protein target and Chemistry42 to generate molecules based on the AlphaFold-predicted structure of the target. They synthesized seven selected molecules and tested these using biological assays.
This led them to identify a small molecule hit compound (Kd, 9.2 ± 0.5 mM) for cyclin-dependent kinase 20 (CDK20) within a month from target identification—a process that can take months to years of iteration using conventional trial-and-error based workflows of drug discovery. A second AI cycle led the researchers to identify a hit compound (ISM042-2-048) with an even greater potency for binding (Kd, 566.7 ± 256.2 nM) and inhibiting (IC50, 33.4 ± 22.6 nM) CDK20.
Earlier studies showed CDK20 is overexpressed in many HCC tumor cell lines. It promotes cell cycle progression via a positive feedback circuit consisting of androgen receptor (AR), CDK20, and β-catenin.
“Therefore, higher binding affinity or enzymatic inhibitory activity in a cell-free system will translate to better anti-proliferation effect in those HCC cell lines with relatively high CDK20 expression,” said Zhavoronkov.
Indeed, in functional assays, the scientists showed that the newly identified hit molecule selectively blocked proliferation of an HCC cell line called Huh7 that expressed excessive quantities of CDK20, compared to a non-HCC cell line (HEK293).
Insilico is focused on using AI tools to accelerate every component in the drug discovery and development process. This includes target identification, novel molecule generation, developing companion biomarkers, personalizing treatments, and analyzing data from clinical trials and other real-world scenarios. “We are combining these steps into a comprehensive pipeline, which provides a feedback loop that continually strengthens our pipeline,” said Zhavoronkov.
Insilico’s Pharma.AI platform uses meta-learning, zero-shot generative reinforcement learning, and genetic algorithms to discover and design potent inhibitors for targets with no known structural data to learn from.
“It has experienced exponential increases in performance and quality over the past few years,“ said Zhavoronkov. “Our platform is built on years of modeling large biological, chemical, and textual datasets in order to discover new targets and design new compounds with desired properties without the use of large molecular libraries.”
The two key engines of Pharma.AI, driving the early-stage drug discovery workflow are Chemistry42 and PandaOmics.
“Whether you are starting with a co-crystal structure or a completely dark target devoid of any known small molecule modulators, Chemistry42 can generate highly optimized small molecule hits, and optimize existing hits and lead compounds,” said Zhavoronkov.
Chemistry42 is a multi-agent reinforcement learning system that consists of 42 generative algorithms that use a variety of molecular representations, base algorithms, and strategies to explore the chemical space and generate potential drugs.
On the other hand, PandaOmics applies deep learning models to identify therapeutic targets associated with a given disease by analyzing omics data from publications, clinical trials, and grant applications. It optimizes potential targets based on novelty, confidence, commercial tractability, druggability, and safety, among other factors.
“We’ve used PandaOmics to identify new targets for cancer, amyotrophic lateral sclerosis (ALS), and COVID-19. The novel target it discovered for idiopathic pulmonary fibrosis has been developed into a lead AI-designed novel drug candidate [INS018_055],” said Zhavoronkov.
INS018_055, a protein kinase inhibitor like ISM042-2-048, is the first drug discovered and designed using AI to reach the Phase I clinical trial milestone. “We have invested deeply into AI as a company and have accumulated a lot of data. We followed $2 trillion worth of research data and invested significant time and resources in making the data machine-readable so that it can be used in our AI platform.”
The company does not intend to move ISM042-2-048 into clinical trials. “The molecule is now publicly available for other researchers to pursue,” said Zhavoronkov. “The purpose of the study was to serve as a proof-of-concept of what is now possible with AI—demonstrating that it is possible to use a predicted structure for a novel target and usable chemical data in just 30 days.”
Insilico Medicine and the Acceleration Consortium are developing self-driving laboratories, an emerging technology that combines AI, automation, and advanced computing to accelerate the discovery of materials and molecular discovery.