xarray-beam
xarray-beam copied to clipboard
Running pipelines on AWS?
Hello! Newbie here. :)
Any tips for someone who would like to try an xarray-beam rechunking pipeline on AWS?
(USGS is wedded to AWS at the moment)
Sorry for the late reply. I'm not too familiar with how we can run this on AWS (I haven't done it myself), but here's how I would do it:
- Set up a (py)Spark cluster in AWS EMR, see the AWS docs or this step-by-step Medium post.
- Specify the
SparkRunnerin the--runnersflag of your python Beam Pipeline.- See also: https://beam.apache.org/documentation/runners/spark/#running-on-a-pre-deployed-spark-cluster
- Likely, you'll need to install apache_beam with AWS's extra requirements:
pip install apache_beam[aws].
I hope that helps.