MetView should work out of the box
Currently, in order to include the MetView dependency on Dataflow, we require users to build their own custom container image and pass specific dataflow arguments in the weather-mv command. In the ideal case, users shouldn't have to worry about docker containers or passing the right arguments; regridding should work out of the box.
Implementation notes
- Short term approach: Publish a public docker image that has MetView installed on a Dataflow worker image.
- Medium / long term approach: Publish a public docker image that has miniconda installed. Allow us to modify our
setup.pybuild command script to runconda installcommands during the setup.
A quick note on the long-term approach: Check out these docs on multistage custom docker environments: https://cloud.google.com/dataflow/docs/guides/using-custom-containers#use_a_custom_base_image_or_multi-stage_builds It seems like it would be pretty easy / quick to build an image from a base Miniconda image that also includes the Python Beam SDK.
FTR I can confirm that the current main repo's build instructions do not work:
command
gcloud builds submit weather_mv/ --tag "$IMAGE_URI:dev"
error
[...]
ModuleNotFoundError: No module named 'conda.cli.main_info'
The command '/bin/sh -c conda install python=${py_version} -y' returned a non-zero code: 1
ERROR
ERROR: build step 0 "gcr.io/cloud-builders/docker" failed: step exited with non-zero status: 1
I also tested @bahmandar 's image here, and it does get regrid working. The build is pretty resource-intensive (~1h on N1_HIGHCPU_32), and downloading and preparing the 2.5GB image itself takes about 10' in Dataflow.
For building the image: do you have anaconda installed on your local machine? (I'm surprised that this seems like a requirement).
Is @bahmandar's image already publicly distributed? If so, that would make fixing this much easier.
It is not, unfortunately.