TianXing: A Linear Complexity Transformer Model with Explicit Attention Decay for Global Weather Forecasting
-
Graphical Abstract
-
Abstract
In this paper, we introduce TianXing, a transformer-based data-driven model designed with physical augmentation for skillful and efficient global weather forecasting. Previous data-driven transformer models such as Pangu-Weather, FengWu, and FuXi have emerged as promising alternatives for numerical weather prediction in weather forecasting. However, these models have been characterized by their substantial computational resource consumption during training and limited incorporation of explicit physical guidance in their modeling frameworks. In contrast, TianXing applies a linear complexity mechanism that ensures proportional scalability with input data size while significantly diminishing GPU resource demands, with only a marginal compromise in accuracy. Furthermore, TianXing proposes an explicit attention decay mechanism in the linear attention derived from physical insights to enhance its forecasting skill. The mechanism can re-weight attention based on Earth’s spherical distances and learned sparse multivariate coupling relationships, promptingTianXing to prioritize dynamically relevant neighboring features. Finally, to enhance its performance in medium-range forecasting, TianXing employs a stacked autoregressive forecast algorithm. Validation of the model’s architecture is conducted using ERA5 reanalysis data at a 5.625° latitude-longitude resolution, while a high-resolution dataset at 0.25° is utilized for training the actual forecasting model. Notably, the TianXing exhibits excellent performance, particularly in the Z500 (geopotential height) and T850 (temperature) fields, surpassing previous data-driven models and operational full-resolution models such as NCEP GFS and ECMWF IFS, as evidenced by latitude-weighted RMSE and ACC metrics. Moreover, the TianXing has demonstrated remarkable capabilities in predicting extreme weather events, such as typhoons.
-
-