graphcast icon indicating copy to clipboard operation
graphcast copied to clipboard

GraphCast to GenCast: Input File Changes?

Open bondijoe27 opened this issue 1 year ago • 5 comments

Hi,

I'm exploring transitioning from GraphCast to GenCast. Besides model definition and parameters, do the input files require changes? Specifically:

Are GraphCast input files directly compatible with GenCast?

If not, what specific input file adjustments are needed (e.g., format, variables, dimensions)?

Any pointers to GenCast input file documentation or examples would be appreciated.

Thanks!

bondijoe27 avatar Jan 08 '25 02:01 bondijoe27

We are doing forecast now. And here is our input:

<xarray.Dataset> Size: 698MB Dimensions: (lat: 721, lon: 1440, time: 2, batch: 1, level: 13) Coordinates:

  • lat (lat) float32 3kB -90.0 -89.75 -89.5 ... 89.75 90.0
  • lon (lon) float32 6kB 0.0 0.25 0.5 ... 359.5 359.8
  • time (time) timedelta64[ns] 16B 00:00:00 06:00:00
  • level (level) int32 52B 50 100 150 200 ... 850 925 1000 datetime (batch, time) datetime64[ns] 16B ... Dimensions without coordinates: batch Data variables: (12/13) geopotential_at_surface (lat, lon) float32 4MB ... 2m_temperature (batch, time, lat, lon) float32 8MB ... mean_sea_level_pressure (batch, time, lat, lon) float32 8MB ... 10m_u_component_of_wind (batch, time, lat, lon) float32 8MB ... 10m_v_component_of_wind (batch, time, lat, lon) float32 8MB ... geopotential (batch, time, level, lat, lon) float32 108MB ... ... ... specific_humidity (batch, time, level, lat, lon) float32 108MB ... vertical_velocity (batch, time, level, lat, lon) float32 108MB ... u_component_of_wind (batch, time, level, lat, lon) float32 108MB ... v_component_of_wind (batch, time, level, lat, lon) float32 108MB ... land_sea_mask (lat, lon) float32 4MB ... total_precipitation_6hr (batch, time, lat, lon) float32 8MB ...

bondijoe27 avatar Jan 08 '25 02:01 bondijoe27

The input files should be identical to what is required for GraphCast, except that GenCast also requires sea surface temperature. For the sea surface temperature, HRES-fc-0 data has a placeholder value over land, but before you feed it to the model you should set values over land to nan, by looking at which pixels are nan for SST in ERA5 data, and setting those to nan. See "Load the example data" section here:

For HRES-fc0 sea surface temperature, we assigned NaNs to grid cells in which sea surface temperature was NaN in the ERA5 dataset (this remains fixed at all times).

To be 100% sure, I would recommend trying to build input data yourself for the same date as the example data provided and verify you get identical input data.

alvarosg avatar Jan 08 '25 10:01 alvarosg

@alvarosg presumably nan on land is set through this function? or users still have to set the corresponding coordinates to nan by themselves?

from graphcast import nan_cleaning

 predictor = nan_cleaning.NaNCleaner(
      predictor=predictor,
      reintroduce_nans=True,
      fill_value=min_by_level,
      var_to_clean='sea_surface_temperature',
  )

?

v-weh avatar May 01 '25 13:05 v-weh

Ah no. The NaNCleaner removes NaNs with the fill value specified. So indeed:

users still have to set the corresponding coordinates to nan by themselves

You'll want to do something like

# We found SST NaN positions in ERA5 are constant in time.
mask = era5["sea_surface_temperature"].isel(time=0, drop=True).compute()
ds["sea_surface_temperature"] = ds["sea_surface_temperature"].where(~mask)
return ds

So to be clear, the idea is:

  1. Get HRES-fc0 with SST (ds)
  2. Replace pixels with NaNs wherever ERA5 has NaNs for SST (code above)
  3. Still run this through NaNCleaner.

Now those pixels have the value the model expects.

Hope this helps,

Andrew

andrewlkd avatar May 01 '25 14:05 andrewlkd

that is clear, thanks! @andrewlkd

v-weh avatar May 01 '25 14:05 v-weh