Delphi Group Uses Data To Forecast the Flu and Other Epidemics
Media Inquiries
Working to help officials manage future public health emergencies, 麻豆村 researchers want to forecast infectious disease outbreaks like meteorologists predict the weather.
Outbreaks of diseases like COVID-19, or a resurgence of one like monkey pox, can happen any time of year, said听, University Professor(opens in new window) of machine learning, language technologies, computer science and computational biology in the听 at Carnegie Mellon.
He co-founded the听 in 2012 with Professor , now at University of California-Berkeley鈥檚 Department of Statistics, to use data to create epidemic forecasts during normal times as well as during public health emergencies. The forecasts can then help people take preventive measures and keep them from catching and spreading illnesses, including influenza, RSV and COVID-19, Rosenfeld said.
鈥淒elphi tries to provide early warning to public health authorities by scanning our indicators for unexplained upward trends,鈥 he said. 鈥淒elphi's indicators can provide a real-time geographically detailed view of the trend鈥檚 dynamics and spread, and Delphi's short-term forecasts can provide geographically detailed risk estimates for a few weeks' horizon.鈥澨
Using data to track and predict outbreaks
Some who catch a respiratory illness may only suffer minor symptoms, but because the risk to vulnerable groups, such as infants and those who are immunocompromised, is so much greater, the forecast can better inform and influence their personal decision-making, as well as decisions by public health officials and healthcare organizations.
For example, during the peak week of flu season, which can vary by a few weeks from place to place and by a few months across seasons, the risk for people in those groups can be up to 40 times higher than the risk off-season, Rosenfeld said.
鈥淚t should be possible to make people aware of when the wave is coming to their city, at different times of the year in different seasons,鈥 he said. 鈥淚 believe we're not far from a time when people will be able to look on their phone and see what is the current level of circulation of any major pathogen in their city and what is the current prediction of when a wave will arrive.鈥
Members of the Delphi Research Group 鈥 which has expanded to include听Will Townes(opens in new window), assistant professor in 麻豆村鈥檚听Statistics & Data Science Department(opens in new window) in the听Dietrich College of Humanities and Social Sciences(opens in new window); , assistant professor in 麻豆村鈥檚听 in the听; , a statistics professor at the University of British Columbia; as well as staff members, graduate and undergraduate students 鈥 realized to make these predictions meaningful that they needed to aggregate and curate as much reliable, real-time data as possible.听
狈辞飞,听, the repository they built, lists more than 1,600 distinct indicators for a variety of pathogens, with a total of over 5 billion de-identified records. Millions of records are collected, cleaned up and categorized then added daily. These include traditional government statistics, indicators derived from insurance claims, laboratory test results, and electronic medical records, statistics on night coughing and search trends.
More data means better prediction accuracy, with the diversity and volume of sources allowing researchers to confirm suspected trends and tell them apart from random fluctuations, Rosenfeld said.
鈥淲e learned that perhaps the biggest obstacle to useful forecasts is the lack of data,鈥 he said. 鈥淲e initially focused on improving our projection of the future of epidemics, but soon realized that if we improve our situational awareness about the present that will automatically translate into improved forecasts for the future.鈥
Sleep Cycle, a sleep-tracking technology company, recently听 with Delphi to provide the research group with privacy-protected aggregated sleep data, including information about coughing and breathing patterns from wearable and sleep-monitoring devices.听
Since symptoms like coughing and congestion often appear days before someone seeks medical care, this data offers earlier warning of outbreaks than hospital records, Rosenfeld said.听
鈥淚 envision a future where epidemic forecasting is everywhere, properly understood and useful,鈥 he said.
Why partnerships and revisions matter
Roughly half of Delphi鈥檚 indicators now come directly from nongovernment partners, according to Rosenfeld, who emphasized the importance of building partnerships for data access.
These include healthcare companies鈥 electronic health record summaries and laboratory testing results that are not publicly released, as well as nonadjudicated insurance claims, which arrive faster than finalized billing data. These relationships are carefully negotiated to ensure all data is first de-identified.
鈥淲e are constantly reaching out to launch these collaborations with organizations who hold data that is of value,鈥 Rosenfeld said.
, professor and director of the Machine Learning Department, said Delphi鈥檚 work represents one way the department applies research to creating broader societal impact.
鈥淒elphi reflects what MLD is about: combining strong statistical foundations with modern machine learning to tackle urgent, real-world problems,鈥 he said. 鈥淒elphi will continue to play a leading role in shaping epidemic forecasting in the U.S. and stands as a powerful example of MLD's innovative ecosystem.鈥
Public health data evolves. Early reports are often incomplete and get revised over time.听
Instead of relying only on finalized numbers, Delphi preserves each version of the data as it was originally reported. This allows them to test forecasts under real-world conditions, using the same provisional information decision-makers must rely on in real-time.
鈥淚f you train a forecasting model on finalized and cleaned-up data, you鈥檙e cheating,鈥 Rosenfeld said. 鈥淚n real life, forecasters only have access to messy, preliminary data.鈥澨
Using provisional data results in more accurate models, and will lead to more accurate forecasts, improving decisions and the public's health.
From pandemic patchwork to lasting infrastructure听
Delphi began with a focus on influenza and shifted to COVID-19 during the pandemic. The group rapidly scaled up thanks to volunteers and temporary collaborators, including dozens of engineers from Google and elsewhere outside the university, who helped build data pipelines quickly.听
Starting in April 2020, Delphi collected real-time data on self-reported COVID-19 symptoms and other disease indicators nationwide. County-level information about the coronavirus pandemic was updated continuously and shared with both the public and health researchers.听
In September 2020,听Google.org donated $1 million to Carnegie Mellon to support听, the Delphi group鈥檚 effort to track and forecast localized COVID-19 activity nationwide.
During the height of the pandemic, Delphi鈥檚 Epidata database, which includes COVIDcast, received an average of 100,000 queries per day. At that time, Delphi began producing COVID-19 forecasts, then sharing them with the Centers for Disease Control and Prevention (CDC).
In 2023, Delphi became one of 13 national Centers for Outbreak Analytics and Disease Modeling at the CDC, collectively known as听. Since then, Delphi has re-engineered its entire data ingestion system using modern tools, creating a more uniform, scalable platform that can bring new data sources online every few weeks, Rosenfeld said. The overhaul has already allowed the group to expand dramatically, with hundreds of additional indicators added in recent years.听
鈥淭he five-year funding horizon gave us the depth and the confidence to revamp our systems to make them faster and more responsive,鈥 said
Adam Johns, Delphi鈥檚 engineering manager. 鈥淭his agreement with the CDC allowed us to think more about the future and to redesign our pipelines to be much more uniform, robust and easy to maintain, with the ability to scale at need to more and larger data sources.鈥
Public health officials use the records to inform their actions and communications, and to support their own forecasting activities. Healthcare systems can use them to inform decisions on purchasing and equipment positioning, scheduling of elective procedures, vacations and other short-term staffing decisions. Individuals can use the forecasts to assess current and near-term risk, and influence personal decision-making.
Data sources that were available only during the pandemic are still accessible for retrospective analysis, and are also configured for rapid resumption during the next emerging event, Rosenfeld said.
鈥淲e believe that with the next public health emergency, some of them will be reopened, so we want to be ready for that,鈥 he said.
Making public health data more useful to the public
Delphi鈥檚 public-facing Epidata platform allows registered and unregistered users to browse, visualize and download this data without needing advanced programming skills (registration is required for large-volume downloads). Users can filter by disease, geography, data source or time period, then plot trends or export the information for further analysis. Beyond public health, registered users access the data for research education, forecasting, analysis and incorporation into reports.
Delphi also helps users discover data that exists elsewhere, such as local and state public health agencies, documenting where the data lives and how it might be accessed, said Peter Jhon, Delphi鈥檚 executive director and strategic coordinator of public health research initiatives, adding that most of the available data can be repurposed for noncommercial use through a Creative Commons license.
Ultimately, Delphi wants to quantify infectious disease risks and make them local, timely, understandable and actionable, especially when a new epidemic is on the horizon.
鈥淥ne of our core values is to make our data as maximally accessible as possible,鈥 Jhon said, 鈥淲e want to give the public better insight into the information that we have.鈥
Delphi's Recognitions
2019: Designated by CDC鈥檚 Influenza Division as a national听 鈥 one of only two nationwide.
2021: Recognized by 麻豆村's School of Computer Science with its听.
2021: Recognized by the American Statistical Association (ASA) with the 2021 Statistical Partnerships Among Academe, Industry, and Government (SPAIG) Award(opens in new window).
2022: Recognized by the American Association of Public Opinion Research with its Policy Impact Award and Warren J. Mitofsky Innovators Award(opens in new window) for Delphi's COVID-19 Trends and Impact Surveys (CTIS).
2023: Designated by the CDC's Center for Forecasting and Outbreak Analytics (CFA) as a National Center of Innovation(opens in new window).
2026:听 by Astronomer, the company that hosts Delphi鈥檚 Airflow-based data platform, as part of its Data Excellence Awards.