Massive data analysis shows what drives the spread of flu in the US

IMAGE: Flu cases cases in the US tend to originate in the southeast and move north away from the coasts.

Image credit: 
Andrey Rzhetsky, UChicago

Using several large datasets describing health care visits, geographic movements and demographics of more than 150 million people over nine years, researchers at the University of Chicago have created models that predict the spread of influenza throughout the United States each year.

They show that seasonal flu outbreaks originate in warm, humid areas of the south and southeastern U.S. and move northward, away from the coasts. The approach differs from traditional flu tracking models that rely on transmission rates, or the expected number of people who can get sick and pass the virus along to others. Instead, the new models use several other factors that influence those transmission rates. The study was published Februrary 27 in the journal eLife.

"It's a very high-resolution picture, perhaps even higher than what the Centers for Disease Control and Prevention can see, because it incorporates so many data sources," said Andrey Rzhetsky, PhD, the study's senior author and the Edna K. Papazian Professor of Medicine and Human Genetics at UChicago.

Rzhetsky and Ishanu Chattopadhyay, PhD, assistant professor of medicine at UChicago and the study's lead author, began with health care records from Truven MarketScan, a database of de-identified patient data from more than 40 million families in the United States. They analyzed nine flu seasons, from 2003 to 2011, flagging insurance claims for treatment for flu-like symptoms. This data shows when and where each flu outbreak begins and generates "streams" to track its spread from county to county. The source counties tended to be on the coasts near the Gulf of Mexico or the Atlantic Ocean.

The researchers also analyzed 1.7 billion geo-located messages from Twitter over a three-and-a-half-year period to capture people's week-to-week travel patterns between counties. For example, if someone routinely tweets from home, then tweets from work or while visiting family in the next county, this would establish a pattern of movement between the two counties.

The analysis also incorporated data on "social connectivity," which included estimates of how often people visit close friends and neighbors, air travel, weather, vaccination rates and changes in the flu virus itself.

The team combined all of these data points to draw a picture of what factors drive the northward spread of the flu each year. In the paper, they liken the typical outbreak to a forest fire. To spread, a fire needs flammable, dry tinder, an initiating spark and wind to hasten its movement. In the southern U.S., people have a high degree of social connectivity. The number of close friends, friends who are also neighbors, and communities of people who all know each other is much higher than the country at large, meaning they have lots of opportunities to spread the flu.

This high social connectivity is the flammable material. The spark is the warm, humid weather of the southern coast, and the wind is the collective movement of all these people, over short distances by land, as they drive from county to county.

The researchers were able to use these models to recreate three years of historical flu data fairly accurately. Rzhetsky said that as the first reports of the flu begin to come in each fall, these tools could be used to help public health officials focus prevention efforts.

"For example, if flu-like symptoms are being reported in one county, you could tell people in neighboring counties to stay away from crowds, or you could focus vaccination efforts in certain places in advance," he said. "It could be used essentially as a weather forecast for the flu."

Credit: 
University of Chicago Medical Center