GraphCast to GenCast: Input File Changes?
Hi,
I'm exploring transitioning from GraphCast to GenCast. Besides model definition and parameters, do the input files require changes? Specifically:
Are GraphCast input files directly compatible with GenCast?
If not, what specific input file adjustments are needed (e.g., format, variables, dimensions)?
Any pointers to GenCast input file documentation or examples would be appreciated.
Thanks!
We are doing forecast now. And here is our input:
<xarray.Dataset> Size: 698MB Dimensions: (lat: 721, lon: 1440, time: 2, batch: 1, level: 13) Coordinates:
- lat (lat) float32 3kB -90.0 -89.75 -89.5 ... 89.75 90.0
- lon (lon) float32 6kB 0.0 0.25 0.5 ... 359.5 359.8
- time (time) timedelta64[ns] 16B 00:00:00 06:00:00
- level (level) int32 52B 50 100 150 200 ... 850 925 1000 datetime (batch, time) datetime64[ns] 16B ... Dimensions without coordinates: batch Data variables: (12/13) geopotential_at_surface (lat, lon) float32 4MB ... 2m_temperature (batch, time, lat, lon) float32 8MB ... mean_sea_level_pressure (batch, time, lat, lon) float32 8MB ... 10m_u_component_of_wind (batch, time, lat, lon) float32 8MB ... 10m_v_component_of_wind (batch, time, lat, lon) float32 8MB ... geopotential (batch, time, level, lat, lon) float32 108MB ... ... ... specific_humidity (batch, time, level, lat, lon) float32 108MB ... vertical_velocity (batch, time, level, lat, lon) float32 108MB ... u_component_of_wind (batch, time, level, lat, lon) float32 108MB ... v_component_of_wind (batch, time, level, lat, lon) float32 108MB ... land_sea_mask (lat, lon) float32 4MB ... total_precipitation_6hr (batch, time, lat, lon) float32 8MB ...
The input files should be identical to what is required for GraphCast, except that GenCast also requires sea surface temperature. For the sea surface temperature, HRES-fc-0 data has a placeholder value over land, but before you feed it to the model you should set values over land to nan, by looking at which pixels are nan for SST in ERA5 data, and setting those to nan. See "Load the example data" section here:
For HRES-fc0 sea surface temperature, we assigned NaNs to grid cells in which sea surface temperature was NaN in the ERA5 dataset (this remains fixed at all times).
To be 100% sure, I would recommend trying to build input data yourself for the same date as the example data provided and verify you get identical input data.
@alvarosg presumably nan on land is set through this function? or users still have to set the corresponding coordinates to nan by themselves?
from graphcast import nan_cleaning
predictor = nan_cleaning.NaNCleaner(
predictor=predictor,
reintroduce_nans=True,
fill_value=min_by_level,
var_to_clean='sea_surface_temperature',
)
?
Ah no. The NaNCleaner removes NaNs with the fill value specified. So indeed:
users still have to set the corresponding coordinates to nan by themselves
You'll want to do something like
# We found SST NaN positions in ERA5 are constant in time.
mask = era5["sea_surface_temperature"].isel(time=0, drop=True).compute()
ds["sea_surface_temperature"] = ds["sea_surface_temperature"].where(~mask)
return ds
So to be clear, the idea is:
- Get HRES-fc0 with SST (
ds) - Replace pixels with NaNs wherever ERA5 has NaNs for SST (code above)
- Still run this through
NaNCleaner.
Now those pixels have the value the model expects.
Hope this helps,
Andrew
that is clear, thanks! @andrewlkd