xarray-beam icon indicating copy to clipboard operation
xarray-beam copied to clipboard

Running pipelines on AWS?

Open rsignell-usgs opened this issue 4 years ago • 1 comments

Hello! Newbie here. :) Any tips for someone who would like to try an xarray-beam rechunking pipeline on AWS?
(USGS is wedded to AWS at the moment)

rsignell-usgs avatar Jan 02 '22 14:01 rsignell-usgs

Sorry for the late reply. I'm not too familiar with how we can run this on AWS (I haven't done it myself), but here's how I would do it:

  • Set up a (py)Spark cluster in AWS EMR, see the AWS docs or this step-by-step Medium post.
  • Specify the SparkRunner in the --runners flag of your python Beam Pipeline.
    • See also: https://beam.apache.org/documentation/runners/spark/#running-on-a-pre-deployed-spark-cluster
    • Likely, you'll need to install apache_beam with AWS's extra requirements: pip install apache_beam[aws].

I hope that helps.

alxmrs avatar Jan 19 '22 23:01 alxmrs