Researchers from Cambridge, U.K., and Germany have used a genetic network technique known as phylogenetic network analysis to reconstruct the early evolutionary paths of SARS-CoV-2 in humans as infection spread from Wuhan out to Europe and North America. By analyzing the first 160 complete viral genomes to be sequenced from human patients, the scientists say they have mapped some of the original spread of the new coronavirus through its mutations, which result in different viral lineages. The team’s results identified three central variants, A, B, and C, which spread differentially.
“There are too many rapid mutations to neatly trace a COVID-19 family tree,” explained University of Cambridge geneticist Peter Forster, PhD. “We used a mathematical network algorithm to visualize all the plausible trees simultaneously. These techniques are mostly known for mapping the movements of prehistoric human populations through DNA. We think this is the first time they have been used to trace the infection routes of a coronavirus like COVID-19.”
Forster is lead author of the team’s paper, which is published in the Proceedings of the National Academy of Sciences (PNAS), and titled, “Phylogenetic network analysis of SARS-Cov-2 genomes.” The software used in the study, as well as classifications for over 1,000 coronavirus genomes, are freely available at www.fluxus-technology.com.
During early March 2020, the GISAID (originally known as the global initiative on sharing all influenza data) database contained a compilation of 253 SARS-CoV-2 complete and partial genomes that had been contributed by clinicians and researchers from across the world since December 2019, the authors wrote. To try and understand the evolution of the SARS-CoV-2 coronavirus, and to help trace infection pathways and design preventive strategies, Forster and colleagues generated a phylogenetic network of 160, largely complete SARS-CoV-2 genomes.
Their results uncovered three variants of SARS-CoV-2, consisting of clusters of related lineages, which they designated A, B, and C. Variant A was most closely related to the virus found in both bats and pangolins, and represents the effective root of the outbreak. Type B is derived from A, separated by two mutations, then C is in turn a “daughter” of B. “Overall, the network, as expected in an ongoing outbreak, shows ancestral viral genomes existing alongside their newly mutated daughter genomes,” the team noted.
Forster and colleagues found that type A, which was the original human virus genome, was present in Wuhan, but surprisingly was not the city’s predominant virus type. Their analyses showed that mutated versions of A were seen in Americans reported to have lived in Wuhan, and a large number of A-type viruses were found in patients from the United States and Australia.
Wuhan’s major virus type was lineage B, and this was prevalent in patients from across East Asia. However, the variant didn’t travel much beyond the region without further mutations, implying a founder even in Wuhan, or resistance against this type of COVID-10 outside East Asia, the researchers noted. The C variant was identified as the major European type, and was found in early patients from France, Italy, Sweden, and England. It was absent from the study’s Chinese mainland sample, but was found in Singapore, Hong Kong, and South Korea.
Importantly, the researchers say that their genetic networking techniques accurately traced established infection routes: the mutations and viral lineages joined the dots between known cases. “One practical application of the phylogenetic network is to reconstruct infection paths where they are unknown and pose a public health risk,” they stated. The investigators describe a number of cases where the infection history is already known. Their results suggested that one of the earliest introductions of the virus into Italy came via the first documented German infection on January 27, and that another early Italian infection route was related to a “Singapore cluster.”
“Not only does the network confirm the Italian origin of the Mexican virus, but it also implies that this Italian virus derives from the first documented German infection on January 27 in an employee working for the Webasto company in Munich, who, in turn, had contracted the infection from a Chinese colleague in Shanghai who had received a visit by her parents from Wuhan,” the researchers wrote.
“This viral journey from Wuhan to Mexico, lasting a month, is documented by 10 mutations in the phylogenetic network.” Forster commented, “The viral network we have detailed is a snapshot of the early stages of an epidemic, before the evolutionary paths of COVID-19 become obscured by vast numbers of mutations. It’s like catching an incipient supernova in the act.”
They reason that phylogenetic methods could be applied to the very latest coronavirus genome sequencing to help predict future global hot spots of disease transmission and surge. “One practical application of the phylogenetic network is to reconstruct infection paths where they are unknown and pose a public health risk,” the team wrote, describing a number of cases where the infection history was already known. “Phylogenetic network analysis has the potential to help identify undocumented COVID-19 infection sources, which can then be quarantined to contain further spread of the disease worldwide,” said Forster, a fellow of the McDonald Institute of Archaeological Research at Cambridge, as well as the University’s Institute of Continuing Education.
The investigators suggest that localization of the B variant to East Asia could result from a founder effect: a genetic bottleneck that occurs when, in the case of a virus, a new type is established from a small, isolated group of infections. However, Forster argues that there is another explanation worth considering. “The Wuhan B-type virus could be immunologically or environmentally adapted to a large section of the East Asian population. It may need to mutate to overcome resistance outside East Asia. We seem to see a slower mutation rate in East Asia than elsewhere, in this initial phase.” As the authors stated, “A complex founder scenario is one possibility, and a different explanation worth considering is that the ancestral Wuhan B-type virus is immunologically or environmentally adapted to a large section of the East Asian population, and may need to mutate to overcome resistance outside East Asia.”
Since the study reported in PNAS was carried out, the research team has extended its analysis to 1,001 viral genomes. While the latest work has yet to be peer-reviewed, Forster says the results suggest that the first infection and spread among humans of COVID-19 occurred between mid-September and early December. The phylogenetic network methods used by researchers—allowing the visualization of hundreds of evolutionary trees simultaneously in one simple graph—were pioneered in the late 1970s, and further developed during the 1990s. The techniques came to the attention of archaeologist Colin Renfrew, a co-author of the PNAS study, in 1998. Renfrew went on to establish one of the world’s first archaeogenetics research groups, at the University of Cambridge.