In February 2021, seven Russian poultry workers contracted H5N8 avian flu. This avian influenza subtype had never before been known to infect humans and the genetic sequence of the virus was quickly uploaded to GISAID's genetic data archive. For Colin Carlson, a biologist at Georgetown University in Washington DC, it was an opportunity. "I immediately thought, 'I want to do this through FluLeap,'" he says.
FluLeap is a machine learning algorithm that uses sequence data to classify influenza viruses as avian or human. The model was trained on a large number of influenza genomes, including samples from H5N8, to learn the differences between those infecting humans and those infecting birds. But the model had never seen an H5N8 virus classified as human, and Carlson was curious what it would do with this new subtype.
Surprisingly, the model identified him as a human with 99.7 percent certainty. Rather than simply repeating patterns in its training data, such as the fact that H5N8 viruses don't typically infect humans, the model appears to have derived a biosignature of human compatibility. "It's amazing that the model worked," says Carlson. “But it's a data point; it would be more impressive if he could do it a thousand times more.
The zoonotic process of virus jump from wild animals to humans causes most pandemics. As climate change and human encroachment on animal habitats increase the frequency of these events, understanding zoonoses is critical to pandemic prevention efforts, or at least better preparedness.
The researchers estimate that about 1% of the mammalian viruses on the planet have been identified.1, so some scientists have tried to increase our knowledge of this global virus through wildlife samples. This is a daunting task, but over the past decade a new discipline has emerged in which researchers use statistical modeling and machine learning to predict aspects of disease outbreaks, such as: B. global hotspots, potential animal hosts or the ability of a particular virus to infect humans. Proponents of this zoonosis risk prediction technology argue that it will allow us to better target surveillance to the right areas and situations, and guide the development of vaccines and therapies most likely to be needed.
However, some researchers are skeptical about the ability of prediction technology to deal with the scale and ever-changing nature of the virome. Efforts are being made to improve the models and the data on which they are based, but these tools must be part of larger efforts to contain future pandemics.
Virenjagd
Some researchers have long argued that increasing our knowledge of viral diversity will help manage pandemic threats. PREDICT, a $200 million project funded by the United States Agency for International Development (USAID), spent about a decade searching for animal viruses. By the time it ended in 2020, it had identified 949 new viruses in samples from wild animals, livestock and humans in 34 countries.
In retrospect, some of the PREDICT results may appear prescient. A study from 20172estimated that there are thousands of undiscovered coronaviruses in bats (believed to be the source of the SARS-CoV-2 virus) and predicted that Southeast Asia would harbor the largest number of viruses in the family belonging to bats. SARS-CoV-2. 2 heard. It also has activities that involve a high level of human-wildlife contact, such as B. wildlife markets, have been associated with a higher prevalence of the coronavirus.
Another study from 20173collected data on which viruses infect which mammals and created a database of virus-host associations. "The goal was to understand which viruses are capable of infecting humans, which animals we are more likely to get new viruses from, and what underlying factors drive these patrons," says ecologist and study leader Kevin Olival of the EcoHealth Alliance in New York . , a non-profit organization focused on biomonitoring and conservation. The team's analysis showed that the proportion of viruses in a given host species that can infect humans is influenced by the degree of relatedness between humans and that species, as well as factors affecting human-wildlife contact, such as population density and the degree of urbanization in the geographic area of that species. The team used statistical modeling to predict groups of animals and regions likely to harbor large numbers of undetected viruses: bats ranked high, along with rodents and primates, in regions such as South America, Africa and Southeast Asia. The researchers also found traits associated with a zoonotic virus, such as the range of species it can infect.
The team says this information can help guide surveillance efforts. "It allows us to predict the areas of greatest risk," says Jonna Mazet, an epidemiologist at the University of California, Davis, who led PREDICT. Identifying specific threats also allows researchers and local health professionals to tailor mitigation and response capabilities. "It allows communities to say, 'We have this, this and that, and we can mitigate our risk this way,'" says Mazet.
PREDICT was only intended as a pilot project. "It generated a lot of data, but it was just a drop in the ocean," says Olival. "We need something bigger." Therefore, in 2016 researchers proposed the Global Virome Project (GVP), which is seen as a global partnership of governmental agencies, non-governmental organizations and researchers with the aim of discovering most of the viruses in mammals and birds (from which they are derived). cause most zoonotic viruses). 🇧🇷 However, despite criticism from some researchers, it was never funded. It exists today as a non-profit organization with the goal of giving countries the expertise to conduct their own virus research, Mazet says. USAID launched a smaller, much cheaper project called Discovery and Exploitation of Emerging Pathogens: Viral Zoonoses (DEEP VZ) in October 2021.
One criticism of GVP is that the scope of the task is simply unmanageable. Estimate of the PREDICT researchers4that there are 1.67 million unknown viruses in mammals and birds, and while this number is disputed, there is no doubt that virome is enormous. It is also constantly changing, so one-time discovery efforts would not be enough. "RNA viruses evolve at a remarkable rate," says Edward Holmes, a virologist at the University of Sydney in Australia. "Then you should go on."
There is also skepticism that the project would have identified possible pandemics. "I have no problem with that when it comes to understanding the evolution and ecology of the virus," says Holmes. "But as a predictive tool to understand what's coming next, it's not a start." One problem is that some host species and virus families have been studied extensively, but others have barely touched. Existing data also target viruses that are already prevalent5🇧🇷 As a result, most predictions so far have been based on "completely skewed data," says Jemma Geoghegan, a virologist at the University of Otago in New Zealand. Even when a virus is discovered and its genome sequenced, many factors that could affect its potential to cause a pandemic, such as its ability to infect humans and be transmitted from human to human, remain unclear. "Then you have to do all these experiments, which take years and cost a fortune," says Holmes.
This is where machine learning can offer a shortcut. Rather than attempting to fully characterize each new virus, models can be used to identify high-priority targets for further investigation. "What we need is an additional classification system so that we know which viruses to characterize through in-depth virological studies," says Sara Sawyer, a virologist at the University of Colorado, Boulder.

internal models
When a virus is discovered, very little is known about it other than its genetic sequence. Therefore, models that can classify viruses based only on their genome would be particularly useful. Nardus Mollentze, a computer virologist at the University of Glasgow in the UK, and his colleagues have developed such a model, which scores viruses partly based on a measure of their genetic similarity to parts of the human genome.6🇧🇷 Evolutionary pressures on viruses can result in genetic segments very similar to those in the host genome, either to evade the innate immune system or to aid in replication. When tested against a library of 861 known viruses, the algorithm was able to classify them as zoonotic or not with an accuracy of 70%.
Mollentze has since joined the Viral Emergency Research Initiative (Verena), a consortium of researchers working to develop and improve zoonotic predictive models. Mollentze worked with the Verena researchers to combine their algorithm with techniques that leverage knowledge of which viruses infect which hosts, including methods to infer unknown host-virus associations. This combined approach increased performance by about ten percentage points.7🇧🇷 In the future, knowledge of how viruses interact with hosts at the molecular level could flow in. "It's all going to be about proteins and biochemistry," says Carlson, who directs Verena. "This is the future of it."
An important goal is to know which models work well and why. There are models that simply classify based on patterns in the data and others that infer the reasons for those patterns, but it can be difficult to tell the difference between them. "The question remains: do we just teach the machines to repeat things they already know, or do they learn principles that lead to a new space?" says Karlsson.
In the future, the model validation process will be crucial. For example, several studies have attempted to predict which species harbor zoonotic viruses, with mixed results, but there have been few systematic comparisons, making it difficult to know which approaches work. To address this, in early 2020, Verena researchers used predictions of which bat species could harbor the betacoronavirus as a case study.8🇧🇷 They created eight statistical models and used them to create a list of suspicious hosts. Over the next 16 months, 47 new bat hosts were discovered. When the researchers compared them to their predictions, they found that half of the models performed significantly better than chance. These models included traits such as lifespan or species size. The other four models did not take these properties into account and performed poorly.
data developments
Any artificial intelligence (AI) algorithm is fundamentally limited by the data it receives. "AI works when the algorithm is trained on large amounts of high-quality data," says Sawyer. "But there are only a small number of leaks each year, and virus data tends to be dirty because a lot of information is missing." Most researchers agree that the data is currently insufficient. "We don't have enough high-quality data to make good predictions," says Mazet.
The modeling depends to some extent on scientists collecting new data, but previous efforts to discover viruses have been driven by considerations such as the locations and situations of highest risk. What modelers really need is sampling designed to improve geographic and taxonomic coverage, Carlson says. Providing models with more data of this type changes the horizon of what questions can be asked. "With a million data points, you can show how deforestation increases virus prevalence in bats," says Carlson. "With a trillion points you can predict a flood like the weather."
Getting close to that would require global collaboration, with open data sharing as the norm and data standards everyone adheres to. The obstacles to this are political, cultural and ethical rather than scientific. Academic incentives around publishing, for example, are an obstacle to rapid data exchange. It is also crucial to ensure that countries sharing genetic data benefit from it. "This is the key issue, and to address it, trust needs to be built," says Olival. "Make sure you give back, not just with vaccines, but with training, capacity building and article co-authoring."
The Nagoya Protocol, an international agreement that came into force in 2014, enshrines countries' sovereignty over natural resources, including biological samples, and allows them to demand benefit-sharing agreements in exchange for access to such samples. However, some laboratories are already able to synthesize pathogens or begin developing vaccines using only genetic sequence data. "We haven't established anything in international law that deals with sequential data," says Carlson. "Nagoya was not made for this world." Similar problems may one day arise in predicting zoonotic risk. "We use data collected by researchers from the Global South," says Carlson. "There are legitimate questions about what it means to take that data and build a technology."
predict and prepare
For modeling to have real impact, it must result in publicly available tools that provide locally relevant and actionable information. Modeling also needs to be better integrated with experimental work to query the properties of pathogens. Just as a model can identify candidate viruses for further study, these studies can provide information that can be used to validate and refine the models. However, interdisciplinary communication is currently limited. "These are communities that don't talk to each other much or read each other's newspapers," says Sawyer.
Modelers should also clearly communicate the uncertainty inherent in their work and what they mean by forecast so as not to overstate the benefits. "Nobody is saying that we will have the exact time, place and species that will lead to the next pandemic," says Olival. Researchers study probabilities, and unexpected things can and do happen.
Even at best, prediction tools will not completely prevent outbreaks. "I absolutely do not believe that we should base global security on these models," says Carlson. But coupled with better global surveillance systems, targeted vaccine development, and efforts to increase healthcare capacity around the world, their value is clear. "They allow us to do two things: understand what's going on around us and prioritize," says Carlson. Ultimately, this could help reduce the frequency of pandemics. "We can improve the prevention of some of them," says Carlson. "But it requires that we get better at what we do."
This item is partNature's Perspective: Preparing for a Pandemic, an editorially independent supplement produced with third-party financial support.About this content.
references
Carlson, C.J.and otherFil. Trans. R. Soc. Largo B376, 20200358 (2021).
Antonio, S.J.and othervirus development.3, vex012 (2017).
Olival , K. ( 1999 ).and otherNature546, 646–650 (2017).
KAROLL, D.and otherScience359, 872–874 (2018).
(Video) This Artificial Intelligence Map Tracks the CoronavirusWille, M., Geoghegan, J. L. und Holmes, E. C.PLoS-Biol.19, e3001135 (2021).
Mollentze, N., Babayan, S.A. & Streicher, D.G.PLoS-Biol.19, e3001390 (2021).
Poisot, T.and otherpre-presshttps://arxiv.org/abs/2105.14973(2022).
Becker, D.J.Lanzetten microphone3, E625–E637 (2022).
ABOUT THE AUTHORS)
Simon Makinis a UK-based freelance science journalist. His work appeared innew scientist,The economist,american scientist, SheNature, between others. It covers the life sciences and specializes in neuroscience, psychology and mental health.Siga Simon Makin no TwitterPhoto credit: Nick Higgins
Recent articles by Simon Makin
- How viruses spread from wild animals to humans
- A hormone may increase cognition in Down syndrome
- Restrictions on psilocybin "magic mushrooms" are being relaxed as research grows