Researchers at the Broad Institute say they have developed a new computational technique known as “CATCH” that can be used to design molecular baits for any virus known to infect humans and all their known strains, including those that are present in low abundance in clinical samples, such as Zika. The approach can help small sequencing centers around the globe conduct disease surveillance more efficiently and cost-effectively, which can provide crucial information for controlling outbreaks, according to the scientists.
The new study (“Capturing sequence diversity in metagenomes with comprehensive and scalable probe design”) was led by MIT graduate student Hayden Metsky and postdoctoral researcher Katie Siddle, PhD, and it appears online in Nature Biotechnology.
“Metagenomic sequencing has the potential to transform microbial detection and characterization, but new tools are needed to improve its sensitivity. Here we present CATCH, a computational method to enhance nucleic acid capture for the enrichment of diverse microbial taxa. CATCH designs optimal probe sets, with a specified number of oligonucleotides, that achieve full coverage of, and scale well with, known sequence diversity. We focus on applying CATCH to capture viral genomes in complex metagenomic samples. We design, synthesize, and validate multiple probe sets, including one that targets the whole genomes of the 356 viral species known to infect humans,” wrote the investigators.
“Capture with these probe sets enriches unique viral content on average 18-fold, allowing us to assemble genomes that could not be recovered without enrichment, and accurately preserves within-sample diversity. We also use these probe sets to recover genomes from the 2018 Lassa fever outbreak in Nigeria and to improve detection of uncharacterized viral infections in human and mosquito samples. The results demonstrate that CATCH enables more sensitive and cost-effective metagenomic sequencing.”
“As genomic sequencing becomes a critical part of disease surveillance, tools like CATCH will help us and others detect outbreaks earlier and generate more data on pathogens that can be shared with the wider scientific and medical research communities,” said Christian Matranga, PhD, a co-senior author of the new study who has joined a local biotech startup.
Scientists have been able to detect some low-abundance viruses by analyzing all the genetic material in a clinical sample using metagenomic sequencing. However, the approach often misses viral material that gets lost in the abundance of other microbes and the patient’s own DNA, according to the Broad team.
Another approach is to enrich clinical samples for a particular virus. To do this, researchers use a kind of genetic bait to immobilize the target virus’s genetic material, so that other genetic material can be washed away. Scientists in the lab of Pardis Sabeti, MD, had successfully used baits, which are molecular probes made of short strands of RNA or DNA that pair with bits of viral DNA in the sample, to analyze the Ebola and Lassa virus genomes. However, the probes were always directed at a single microbe, meaning they had to know exactly what they were looking for, and they were not designed in a rigorous, efficient way.
What they needed was a computational method for designing probes that could provide a comprehensive view of the diverse microbial content in clinical samples, while enriching for low-abundance microbes like Zika.
“We wanted to rethink how we were actually designing the probes to do capture,” said Metsky. “We realized that we could capture viruses, including their known diversity, with fewer probes than we’d used before. To make this an effective tool for surveillance, we then decided to try targeting about 20 viruses at a time, and we eventually scaled up to the 356 viral species known to infect humans.”
Short for “Compact Aggregation of Targets for Comprehensive Hybridization,” CATCH allows users to design custom sets of probes to capture genetic material of any combination of microbial species, including viruses or even all forms of all viruses known to infect humans.
To run CATCH truly comprehensively, users can easily input genomes from all forms of all human viruses that have been uploaded to the National Center for Biotechnology Information’s GenBank sequence database, explained Metsky. The program determines the best set of probes based on what the user wants to recover, whether that’s all viruses or only a subset. The list of probe sequences can be sent to one of a few companies that synthesize probes for research. Scientists and clinical researchers looking to detect and study the microbes can then use the probes like fishing hooks to catch desired microbial DNA for sequencing, thereby enriching the samples for the microbe of interest.
Tests of probe sets designed with CATCH showed that after enrichment, viral content made up 18 times more of the sequencing data than before enrichment, allowing the team to assemble genomes that could not be generated from un-enriched samples. The team validated the method by examining 30 samples with known content spanning eight viruses. The researchers also showed that samples of Lassa virus from the 2018 Lassa outbreak in Nigeria that proved difficult to sequence without enrichment could be “rescued” by using a set of CATCH-designed probes against all human viruses. In addition, the team was able to improve viral detection in samples with unknown content from patients and mosquitos.
Using CATCH, Metsky and colleagues generated a subset of viral probes directed at Zika and chikungunya, another mosquito-borne virus found in the same geographic regions. Along with Zika genomes generated with other methods, the data they generated using CATCH-designed probes helped them discover that the Zika virus had been introduced in several regions months before scientists were able to detect it, a finding that can inform efforts to control future outbreaks.
To demonstrate other potential applications of CATCH, Siddle used samples from a range of different viruses. Siddle and others have been working with scientists in West Africa, where viral outbreaks and hard-to-diagnose fevers are common, to establish laboratories and workflows for analyzing pathogen genomes on-site. “We’d like our partners in Nigeria to be able to efficiently perform metagenomic sequencing from diverse samples, and CATCH helps them boost the sensitivity for these pathogens,” said Siddle.
The method is also a powerful way to investigate undiagnosed fevers with a suspected viral cause. “We’re excited about the potential to use metagenomic sequencing to shed light on those cases and, in particular, the possibility of doing so locally in affected countries,” added Siddle, who noted that one advantage of the CATCH method is its adaptability.
As new mutations are identified and new sequences are added to GenBank, users can redesign a set of probes with up-to-date information. In addition, while most probe designs are proprietary, Metsky and Siddle have made publicly available all of the ones they designed with CATCH. Users have access to the actual probe sequences in CATCH, allowing researchers to explore and customize the probe designs before they are synthesized.
Sabeti and fellow researchers say they are excited about the potential for CATCH to improve large-scale high-resolution studies of microbial communities. They are also hopeful that the method could one day have utility in diagnostic applications, in which results are returned to patients to make clinical decisions. For now, they’re encouraged by its potential to improve genomic surveillance of viral outbreaks like Zika and Lassa, and other applications requiring a comprehensive view of low-level microbial content.