ChatMacro: Evaluating Inflation Forecasts of Generative AI*

2026-04 | February 5, 2026

Recent research suggests that generic large language models (LLMs) can match the accuracy of traditional methods when forecasting macroeconomic variables in pseudo out-of-sample settings generated via prompts. This paper assesses the out-of-sample forecasting accuracy of LLMs by eliciting real-time forecasts of U.S. inflation from ChatGPT. We find that out-of-sample predictions are largely inaccurate and stale, even though forecasts generated in pseudo out-of-sample environments are comparable to existing benchmarks. Our results underscore the importance of out-of-sample benchmarking for LLM predictions.

Suggested citation:

Alam, M. Jahangir, Shane Boyle, Huiyu Li, and Tatevik Sekhposyan. 2026. “ChatMacro: Evaluating Inflation Forecasts of Generative AI*.” Federal Reserve Bank of San Francisco Working Paper 2026-04. https://doi.org/10.24148/wp2026-04

TopicsArtificial IntelligenceInflationInflation ExpectationsTechnology

About the Authors

Huiyu Li is a research advisor in the Economic Research Department of the Federal Reserve Bank of San Francisco. Learn more about Huiyu Li

ChatMacro: Evaluating Inflation Forecasts of Generative AI*

Subscribe to Working Papers updates