A&T’s Dr. Suzanne O’Regan Part of Team Using Big Data & Artificial Intelligence to Advance Disease Prevention

Through a new $2 million National Science Foundation grant, scientists at the Cary Institute of Ecosystem Studies, the University of Georgia, and North Carolina A&T State University are harnessing the power of machine learning to forecast outbreaks of zoonotic disease.

Each year, more than a billion people become sick from Ebola, Zika, SARS, and other pathogens acquired from wildlife, livestock, and other animals. Prevention relies on an ability to predict when and where pathogens are likely to make the leap from animals to people.

Barbara Han, a disease ecologist at the Cary Institute, is leading the five-year study. She explains, “We want to help shift society from a reactive to a proactive approach to managing zoonotic disease. Instead of responding to outbreaks, let’s try to stop them from happening in the first place. Using big data as a potential surveillance tool is an exciting new step toward prevention.”

Funding will enable the team to bring together information on pathogens, potential animal hosts, and environmental factors known to facilitate disease transmission, with the goal of developing innovative methods of mapping when and where the next major zoonotic disease outbreak might occur.

John Drake of the University of Georgia explains, “We are creating models which draw ‘boundaries’ around which species can host which pathogens, which pathogens can pass from animals to humans, and what combination of environmental factors facilitate spillover and human-to-human transmission. On the basis of these biological properties, we can pinpoint where disease emergence is possible.”

Phase one of the study involves building predictive statistical models that will help the researchers identify traits common among animals that carry disease, and pathogens and parasites that cross the species barrier. “We are looking at data that describe hosts, pathogens, and their environments, to determine which combinations of these features presage disease being realized on a global landscape,” Han says.

Models are built using extensive data sets on the physical and life history traits of host species and known pathogens. Host-pathogen pairings are then linked to the geographical locations with suitable environmental conditions. Also considered are conditions surrounding documented disease outbreaks to determine what factors were at play when that disease broke out.

Suzanne O’Regan of North Carolina A&T State University explains, “By using data that is global in scale, we are seeking to reveal generalizable features of ‘good’ disease carriers. Over 50 life history features are being incorporated into models for most mammal groups.” This includes data on animals’ physical characteristics, metabolic and reproductive rates, range of diet, and timing of daily activity – whether the animal is primarily active during the day, at night, or at dawn and dusk.