Skip to main content
Artificial intelligence

Artificial intelligence

Physics-based models still beat AI for predicting extreme weather events

05 Jun 2026 Isabelle Dumé
A colour plot (red for hot, blue for cold) of temperature anomalies during the 2020 Siberian heat wave, superimposed over a map of west and central Asia. There's a gigantic deep-red blob over Siberia.
Extreme heat: Temperature anomalies during the 2020 Siberian heatwave, an event that shattered historical records and triggered severe wildfires. (Courtesy: S Engelke)

Artificial intelligence (AI)-based weather models are not as good as physics-based forecasting systems at predicating extreme weather events, say researchers at the University of Geneva, Switzerland and the Karlsruhe Institute of Technology, Germany. After comparing the outputs of different models on the same dataset of extreme events, the team found that AI models systematically erred on the side of normality, underestimating temperatures for extremely hot events while overestimating them for cold ones. This could be because the models learn from what has already happened and struggle to forecast events outside their training data.

With extreme weather events growing more common and intense due to our rapidly warming climate, being able to predict them is becoming ever more important. In recent years, meteorologists have sought to address this by developing a new generation of AI models that vie with physics-based numerical weather prediction (NWP) systems in the accuracy and extent of their forecasts – at least for everyday weather events.

Black or grey swans

In the new work, researchers led by Zhongwei Zhang and Sebastian Engelke sought to understand whether AI could also be competitive when forecasting extreme weather episodes. These episodes are usually defined by variables such as wind, atmospheric pressure or temperature falling well outside norms for location and season, and are sometimes termed “black swans” or “grey swans” depending on how extreme they are.

“In the past, only relatively moderate extremes were documented,” Zhang and Engelke observe. “But given the current rate of high rate of global warming, record-breaking events now sometimes exceed previous record levels by large margins.”

That’s potentially a problem for AI, they add, because several recent studies have shown that such models come up short when asked to extrapolate beyond their training data. For example, in 20205, researchers at the US National Oceanic and Atmospheric Administration (NOAA) and the Allen Institute for Artificial Intelligence in Washington, US, found that a seasonal AI forecasting model could not predict values for the North Atlantic Oscillation, which plays a crucial role in Europe’s weather and climate. Another study found that models tend to underpredict the intensity of the most extreme storms (as measured by mean sea-level pressure) or other high-impact events such as heat waves.

A large sample of record-breaking events

To test this hypothesis, Zhang and Engelke constructed a large dataset of record-breaking events for heat, cold, and wind extremes during 2018 and 2020. This dataset included several well-known events, such as the Siberian heatwave in early 2020 and the US heatwave of August 2020, but also tens of thousands of less-heralded ones. The 2020 dataset, for example, included 162,751 heat, 32,991 cold, and 53,345 wind records spread across different seasons and climatic zones.

The researchers assessed how well the three leading deterministic AI weather models –GraphCast, Pangu-Weather (and operational variants) and Fuxi – performed when extrapolating from this dataset of record-breaking events. They then compared these models’ performance to that of the physics-based High RESolution forecast (HRES) model developed by the European Centre for Medium-Range Weather Forecasts (ECMWF), which is widely acknowledged as today’s best physics-based NWP model.

In line with previous studies, the researchers found that both GraphCast and Fuxi were better than HRES at forecasting normal weather events. However, for record-breaking temperature and wind events in 2020, the situation was reversed, with the physics-based HRES model consistently outperforming all AI models for hot and cold temperature records as well as wind speed records.

A difficult study

The researchers report that the most difficult aspect of the study was the sheer length of computation time required to analyse huge AI and numerical forecast datasets. To complicate things further, Zhang notes that some of the most recent AI weather models are being developed by big tech companies and are not publicly available.

As well as showing that purely data-driven AI models struggle to forecast record-breaking extreme weather events, Engelke says the team’s work also provides a protocol for systematically evaluating forecasts of extreme events. “We hope this will motivate the research community to thoroughly evaluate the next generations of AI forecasts to advance our understanding of their advantages and their limitations compared to conventional physics-based models,” he says.

In this study, which appears in Science Advances, the Geneva/Karlsruhe researchers focused on evaluating the forecasts of deterministic AI weather models. They are now doing the same for forecasts made by recent probabilistic AI weather models, which they suspect will face similar extrapolation limitations.

“We are also working on building AI models ourselves that have improved forecast performance on extreme events,” Zhang and Engelke reveal. “This might be achieved this by creating hybrid models, which are a smart combination of physical and AI-based weather models.”

Back to Artificial intelligence Artificial intelligence
Copyright © 2026 by IOP Publishing Ltd and individual contributors