COVID-19 Forecasts by County
COVID-19 Forecasts by County provides U.S. county-level forecasts of COVID-19 infection growth rates based on the econometric model developed in Wilson (2020). The forecast methodology uses county panel data from early 2020 through the latest available data to estimate out-of-sample predicted values from a panel fixed-effects regression specification. To measure the reliability of these forecasts, this page also shows a comparison of past predictions to estimates of actual growth over several forecast horizons. See Wilson (2021) for a discussion of the forecast accuracy.
The regression equation is derived from the canonical epidemiological model of infectious disease spread, known as the Susceptible-Infectious-Removed (SIR) model. It relates a county’s subsequent growth, measured as the change in the log number of infected residents, over a specific horizon to current and past values of observable drivers of disease transmission. These drivers include social distancing behavior (mobility), weather (temperature and precipitation), and confirmed COVID-19 cases per capita to date. The regression also includes county fixed effects and county-specific linear time trends. This allows counties to have different levels and trends in their infection growth rates independent of the differences in those observed drivers of transmission.
This forecasting model relies on daily county-level, near real-time data on weather, mobility, and COVID-19 cases. The weather data are based on NOAA/NCDC weather station readings and aggregated to the county level following Wilson (2019). The mobility data used here are from the Dallas Fed’s Mobility and Engagement Index (MEI), which is based on cell-phone geolocation data provided by Safegraph. Using other mobility data, such as Google Mobility Reports, yields similar results, as demonstrated in Wilson (2020). Data on COVID-19 cases come from usafacts.org, which compiles data from state public health agencies.
The forecasts also incorporate state-level data from Centers for Disease Control and Prevention on vaccinations. To do so, the data need to be adjusted for individual counties. Specifically, counties are assumed to have the same share of their population fully vaccinated as their state’s average, and vaccinations are assumed to be 95% effective. In accordance with the standard SIR model, this effective vaccination share of the population directly reduces forecasted growth in infections. In other words, vaccinations move people from the susceptible category (S) to the removed category (R).
These forecasts will be updated approximately every week, assuming there are no disruptions in data feeds for weather, mobility, and cases. The latest forecasts from this empirical model for horizons from 10 days ahead up to 70 days ahead (at 10-day intervals) are available in the downloadable file.
As an illustration, Map 1 shows the forecasts for 30 days ahead.
Map 1: Projected Growth of COVID-19 Infections by County, 30 Days Ahead
To put these latest forecasts in perspective, Map 2 shows the actual change in log active infections over the past 30 days, based on approximating active infections using data on cases; see Wilson (2020) for details.
Map 2: Actual Growth of COVID-19 Infections by County, 30 Days Prior
To assess the accuracy of the forecasting model, Map 3 shows the model’s out-of-sample forecasts of the change in log active infections over the same 30-day period as in Map 2, using data only up to the beginning of the period. A strong similarity between Maps 2 and 3 indicates relatively high accuracy of the forecasts.
Map 3: Prior Month Projections for Infection Growth by County
Table 1 provides formal evidence on forecast accuracy for various forecast horizons. The table shows two accuracy metrics for the county forecasts, comparing each horizon’s forecast, as of the beginning of that period, against actual growth in infections over the same period. For example, the first row evaluates the accuracy of a 10-day-ahead forecast of infection growth, based on data up until 10 days from the last day of available data, by comparing it to actual growth in infections over those 10 days. The first metric (column 1) is the cross-county correlation between the forecasted value and the actual value of the change in log active infections over those 30 days. The closer this correlation is to 1, the better the forecasting model is at predicting the cross-county distribution of COVID-19 growth. The second metric (column 2) is the root mean-squared forecast error (RMSFE), which is the square root of the mean across counties of their squared forecast errors. The smaller the RMSFE, the more accurate is the average county’s forecast.
To put these forecast errors in perspective, it is useful to compare them to the cross-county means of the forecasted and actual values of the change in log active infections, shown in the third and fourth columns. The fifth column of the table shows how many counties had sufficient data to produce a forecast.
|Horizon (Days)||Correlation||RMSFE||Mean Actual Value||Mean Predicted Value||No.|
|Note: The values shown in (2), (3), and (4) are based on population-weighted means.|
Wilson, Daniel J. 2021. “SF Fed Launches Tool to Forecast COVID-19 Infections.” SF Fed Blog, February 3, 2021. https://www.frbsf.org/our-district/about/sf-fed-blog/sf-fed-launches-tool-forecast-covid-19-infections/
Wilson, Daniel J. 2020. “Weather, Mobility, and COVID-19: A Panel Local Projections Estimator for Understanding and Forecasting Infectious Disease Spread.” FRBSF Working Paper 2020-23 (December). https://doi.org/10.24148/wp2020-23
Wilson, Daniel J. 2019. “Clearing the Fog: The Predictive Power of Weather for Employment Reports and their Asset Price Responses.” American Economic Review: Insights 1(3), December, pp. 373–388. https://doi.org/10.1257/aeri.20180432
Projections data (Excel document, 962 kb)