In the nearly seven decades since the discovery of DNA’s double-helical structure, many of the most significant advances in human health have been rooted in biotechnology and biomanufacturing. Across fields such as health monitoring, multiomics medicine, cell-based therapies, artificial intelligence (AI)-driven bioinformatics, and gene editing, the accumulation of biotechnology and biomanufacturing knowledge promises to bring about additional advances in human health while preserving biosafety and biosecurity.
Such advances were highlighted in a report generated by the White House Office of Science and Technology Policy (OSTP). The report was released last March and titled “Bold Goals for U.S. Biotech and Biomanufacturing.”1 In a healthcare-focused chapter, the report stated, “The U.S. government, in collaboration with the private sector, can advance areas throughout the full health continuum—from prevention to diagnosis and monitoring, to more efficient therapeutic manufacturing, to therapy and ultimately healthy survivorship.” Specific goals with 5-year and 20-year horizons were announced.
The OSTP report recognizes that its goals are so ambitious that they basically call on the U.S. to strengthen its entire bioeconomy. A stronger bioeconomy would place the U.S. in a strong position globally, but there are some significant roadblocks standing in the way of success, including the need to facilitate the collaboration of multiple stakeholders, legitimate privacy concerns, the possibility of misuse of technologies for harm, and extremely high costs.
Using genomics to monitor and treat disease
Two key themes of the OSTP’s goals for improving the population’s health are accessibility of health monitoring and precision multiomics medicine. The OSTP notes that its health monitoring goals are bold. The initial focus is on identifying indicators of healthy aging across the lifespan and meeting the needs of diverse populations. From there, the OSTP advocates the development of a simple-to-use home diagnostic assay kit to help build a vast data bank that will enable the identification of trends across variables such as gender, race, and geographic area.
Overseas, the U.K.’s National Health Service (NHS) has already integrated genomics into its day-to-day operations. It is home to the 100,000 Genomes Project, jointly rolled out with Genomics England, making it the first healthcare system to offer whole genomics sequencing as standard to those with rare diseases or cancer.2 Besides saving hundreds of thousands of pounds, this method of diagnosis can be used to identify other family members at risk and help NHS researchers add to a data bank that can be studied to better understand how and why these diseases occur. The potential to use the NHS experience as a blueprint to help inform the way the U.S. rolls out gene therapy administration is clear, but pivotal to its success are reproducible data pipelines that can be deployed quickly and accurately to help with diagnosis and treatment.
The need for reproducible data pipelines also arises when the report discusses the theme of precision multiomics medicine. The U.S. goal is to collect multiomics measures in large cohorts with diverse populations and develop molecular classifications for diagnosis and/or treatment. This builds on work that is already happening in the U.S. public sector, which uses genetic sequencing to monitor for disease.
Some of the scientists involved in this effort, such as Robert Petit, PhD, a senior bioinformatics scientist at Wyoming Public Health Laboratory, are building on methods originally developed to identify bacteria. Petit says that using genetic sequencing can indicate what each sample might be resistant to—a huge step forward in stopping the spread of infectious diseases. However, the genomics discipline is complex, and a number of challenges stand in the way of deploying these tools at scale.
Identifying the immediate genomics data challenges
Bioinformatics has long used data to improve the way healthcare is delivered. In the next decade, this mission may be complicated by a data surge. Estimates suggest that between 2 and 40 exabytes of genomics data will be produced,3 thanks in part to the wide deployment of technology that can be used to sequence a genome in hours instead of weeks.4 With the advent of massive datasets, cloud storage, and AI analysis, scientific results are being generated at an unprecedented scale, prompting questioning not just of the results themselves, but also of the processes that generate them.
When the U.S. rolls out genomics processes for health monitoring, it will have to deal with infrastructure and analysis pipelines that are often bespoke and not shareable between researchers or organizations. The most cost- and time-effective solutions for collaboration at scale are platforms that allow scientists to standardize the way diagnostics are run by working on them together. These platforms allow for the processing to be moved to the data, thus complying with strict data laws.
Collaboration at scale can also lead to complications in research areas such as genetics. For example, even highly cited studies may contain sequencing errors that are difficult to detect and validate without open data analysis frameworks. At the very earliest stages of research, mitigating these errors requires careful quality control and data preprocessing to filter out errors and artifacts that can arise during the sequencing process, reducing the speed at which results can be achieved. Not only is this ineffective in terms of energy spent processing, but it is also frequently a costly and resource-consuming procedure.
Any country that is looking to build a bioeconomy will also have a series of significant security challenges to address in both R&D and the implementation and delivery stages. The OSTP report highlights concerns about “the potential risk of lab or manufacturing accidents or the misuse of medical technologies for harmful purposes, which in turn may cause harm to human health, public trust, or the environment.” The report suggests that the U.S. government, in collaboration with the biomedical community, should address these challenges while also ensuring that innovation and discovery are not impeded.
Naturally, the U.S. government must start by assessing security risks throughout the entire biomedical process, from conception to delivery of product. The technology used to address these and other challenges is evolving rapidly, so what trends are we likely to see as U.S. ambitions become reality?
Tech at the forefront of discovery and implementation
The use of cloud computing is likely to grow further in research labs across the U.S., as it provides a cost-effective and scalable way to store and process large volumes of sequencing data. According to IBM, cloud computing is “a set of systems” that has the potential to “put us on the path toward being able to leverage the entire world’s compute power as if it were a single, infinitely powerful computer.”
As sequencing data keeps growing in volume and complexity, the use of AI and machine learning is expected to become increasingly important for data analysis and interpretation. The global market for AI in drug discovery alone was forecasted to grow by nearly 50% between 2019 and 2026.5 An implication of this trend is the need for more tools in place to manage data-intensive applications in the cloud.
This type of research and treatment is hugely expensive, and the ability to access data shared by other scientists working in the same area should enable faster, cheaper innovation. U.S. ambitions in genomics and biotech are certainly high, and deploying technology that can maximize both resources and results will be crucial to its success. Building a strong foundation by investing in the right talent and technology early will be pivotal in the development of the country’s bioeconomy.
References
1. Bold Goals for U.S. Biotechnology and Biomanufacturing: Harnessing Research and Development to Further Societal Goals. White House Office of Science and Technology Policy. Washington, DC. March 22, 2023.
2. Geddes L. Whole genome sequencing could save NHS millions of pounds, study suggests. The Guardian. November 10, 2021.
3. Genomic Data Science [Fact Sheet]. National Human Genome Research Institute. Updated: April 5, 2022. Accessed: August 9, 2023.
4. Navarro FCP, Mohsen H, Yan C, et al. Genomics and data science: an application within an umbrella. Genome Biol. 2019; 20: 109. DOI: 10.1186/s13059-019-1724-1.
5. Hickland M. Artificial intelligence in drug discovery market forecast to 2024 released. Front Line Genomics. June 4, 2020.
Evan Floden, PhD, is co-founder and CEO of Seqera Labs, a provider of open source workflow orchestration software. Website: seqera.io.