As the amount of genomic data grows, so too does the challenge of organizing it into a usable database. Indeed, the lack of a searchable database of genomic information from the literature has posed a challenge to the research community. Now, Genomenon’s AI-based approach—the Genomenon Genomic Graph (G3) knowledgebase—combines patient and biological data from nearly all published scientific and medical studies, including demographics, clinical characteristics, phenotypes, treatments, outcomes, and disease-associated genes and variants.
Training of the underlying large language model for G3 uses Genomenon’s proprietary, curated genomic datasets. The knowledgebase will power AI-driven predictive models for clinical diagnostics and drug development applications.
The Ann Arbor, MI, based Genomenon—a provider of genomic intelligence solutions—notes that this advancement represents the first time that content from the entire corpus of clinically relevant literature will be captured in a single, searchable knowledgebase.
The knowledgebase will include genes, genetic variants, copy number variants, structural variants including fusion events, gene-disease relationships, drugs, phenotypes, and patient demographics and symptoms derived from the indexed content of the published scientific research.
In addition to enabling advanced biomedical literature search and alert capabilities—using natural language queries to prioritize, annotate, and summarize relevant articles—this technology provides actionable patient and disease insights. It can also be used to produce an interactive database with structured generative AI insights and analytics to meet the needs of real-world evidence applications.
“The wealth of clinically relevant information in published research is immense, yet its sheer volume renders it largely inaccessible to researchers and clinicians,” said Mike Klein, Genomenon CEO. “With the G3 knowledgebase, we are organizing all clinically relevant data and information into one searchable structure. Using the knowledgebase and an entirely new way of mining data, we’ve completely changed the types of questions that can be asked. Novel connections that would otherwise remain hidden deep in the scientific literature can now be revealed, the potential of real-world data can be unlocked, and new insights about patient populations can be gained.”
Training the genomic-specific LLM model on Genomenon’s proprietary, expertly curated datasets increases the accuracy of the AI model. These datasets include curated content for germline diseases and cancers from the company’s Mastermind genomic intelligence platform and Cancer Knowledgebase.
“Developing an advanced genomic knowledgebase is among the hardest things to do in terms of the complexity of the complete body of literature, the idiosyncrasies of the nomenclature, and the acuity of the impact,” said Jonathan Eads, Genomenon vp of engineering. “Our AI platform, powered by our genomic-specific large language models, was specifically designed to handle the complexities of historical trends and diverse natural language descriptions, as well as evolving formal nomenclatures and ontologies. Accurately extracting entities and their relationships from decades of publications is a challenging, intricate task—but one that our technology is uniquely equipped to solve and one for which the result is uniquely valuable.”