The results of a study by researchers in Spain suggest that common approaches to analyzing DNA from a community of microbes—a microbiome—can yield erroneous results, largely due to the incomplete databases that are used to identify microbial DNA sequences. A team led by Aiese Cigliano, PhD, of Sequentia Biotech, and Clemente Fernandez Arias, PhD, and Federica Bertocchini, PhD, at the Centro de Investigaciones Biologicas Margarita Salas, reported on their work—through which they evaluated current microbiome analysis techniques on computer simulations of microbial communities—in a report in PLOS ONE. In their paper, titled, “The virtual microbiome: A computational framework to evaluate microbiome analysis,” the team stated, “In this work, we formulate a computational framework to evaluate the performance of metagenomic analyses based on the generation of virtual microbiomes that simulate real bacterial communities. Using this approach, we identified critical limitations in the capability of currently available technologies to characterize microbiomes.”
Microbiomes have been the focus of intense research efforts in recent decades, the authors wrote. “The characterization of the microorganisms colonizing a particular ambient is becoming a gateway to the analysis of the physiological niche that the environment represents, revealing its potential functions or eventual pathological conditions.” Research is wide-ranging, from studies that aim to understand conditions such as obesity and autism by examining the human gut microbiome, to the study of environmental microbial communities to identify microbes that degrade toxic compounds or produce biofuels.
The most commonly used methods to study the microbiome of a chosen animal species are amplicon, and whole-genome sequencing (WGS). These approaches rely on comparing DNA sequences obtained from a biological sample, with sequences in genome databanks. “Therefore, they can only identify those sequences that are already present in databases,” the team continued. But this fact may severely compromise the reliability of microbiome data. “… the amount of information available in databanks should be expected to constrain the accuracy of microbiome analyses, the authors stated. “Albeit normally ignored in microbiome studies, this constraint could severely compromise the reliability of microbiome data.” Moreover, the team stated, comparative studies have already shown that the results of the characterization of bacterial populations by amplicon and WGS do not necessarily overlap, even if they use the same reference databanks.
So, the authors questioned, “To what extent and under what circumstances can we count on the current experimental approaches to understanding the complexity of symbiotic microorganisms living within the gut (or any other tissue) of an animal? Do the available techniques have intrinsic limitations that might influence the analysis of any microbiome, whatever the colonized environment?” To test the consistency of current methods of microbiome analysis, the investigators used computer simulations to create virtual microbiome communities that imitate real-world bacterial populations. “Virtual microbiomes are models of bacterial populations conceived to evaluate the performance of genomic analyses,” they explained. Each virtual microbiome consists of a list of species of bacteria and their respective abundances. “To get a deep insight into this puzzling landscape, we took an in silico approach, creating virtual microbiome communities that simulate the bacterial populations that can be found in humans, insects, etc., or soil, water, and any other medium.” The team then used standard techniques to analyze the virtual communities and compared the results with the original composition.
The team reasoned that the constraints found within the virtual microbiome framework may also extrapolate to a poor overlap between WGS and amplicon analyses in real-world microbiomes that are underrepresented in genomic databases. They tested this through the use of amplicon and WGS techniques to analyze bacteria colonizing different tissues of the larvae of the lepidopteran Galleria mellonella, which can degrade polyethylene and polystyrene. “Despite their abundance, the microbiota of lepidopteran species has been little studied so the bacteria associated with this group have only a marginal representation in databanks to date,” the scientists explained. “This makes G. mellonella larvae an ideal subject to examine the limitations of genomic analyses pinpointed by the virtual microbiome model.” The results, they found, highlighted discrepancies between the two methods in the characterization of bacteria found in the G. mellonella larvae. “The results reflected the incongruences revealed by our model, with an astonishing lack of overlap between the outcomes of the two techniques at the genus level,” they wrote. “Both techniques also showed striking discrepancies in the detection of changes in the insect microbiome.”
The collected findings from the team’s research showed that results from DNA analyses can bear little resemblance to the actual composition of the community, and that a large number of the species “detected” by the analysis may not even be present in the community. “Altogether, our results point to a very limited ability of amplicon and WGS to accurately characterize virtual microbiomes,” they wrote. The findings thus demonstrate significant flaws in the techniques currently used to identify microbial communities.
The researchers concluded that there is a need for increased efforts to collect genome information from microbes and to make that information available in public databases to improve the accuracy of microbiome analysis. “The ever-growing interest in microbiomes and their potential applications is very much dependent on the reliability, richness, and completeness of the databanks available for their accurate description,” the scientists stated. “If microbiome data are to be useful in an effective and reproducible manner, the effort in the field must be channeled towards significantly increasing the amount of available genomic information and finding efficient ways to use this information.”
In the meantime, the results of microbiome studies should be interpreted with caution, especially in cases where the available genomic information from those environments is still scarce, the scientists pointed out. “For those animal species and environments that harbor microbial populations, microbiome analysis stands as a fundamental tool in the comprehension of their physiology and ecology. However, the shortcomings inherent to the used tools and the paucity in the available database hinder and mislead the outcome, and, therefore, the interpretation of microbiome analyses … This work highlights the need for an increased effort in the collection of genomic information and its eventual availability in public databases.”