flinkk8soperator icon indicating copy to clipboard operation
flinkk8soperator copied to clipboard

Configure s3 for checkpointing

Open mootezbessifi opened this issue 4 years ago • 1 comments

Dears

It is recommended to copy the s3 fs proper jar plugin to the plugin path (weither it is hdfs or presto) before starting the job manager. How to support this from CR config part ??

My regards

mootezbessifi avatar Jan 16 '22 20:01 mootezbessifi

@mootezbessifi I don't know if this is still relevant, but this is what we did to store our checkpoints and savepoints in aws s3: under flinkConfig we set the following:

  • s3.access-key: <your access key>
  • s3.secret-key: <your secret key>
  • state.checkpoints.dir: s3://<bucket name>/checkpoints/
  • state.savepoints.dir: s3://<bucket name>/savepoints/
  • state.backend: filesystem

the s3.access-key and s3.secret-key are not required if you are using eks / ec2 with an IAM role that can access S3.

Create the S3 bucket.

In your dockerfile add (see https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/filesystems/plugins/ for more details):

RUN mkdir /opt/flink/plugins/s3-fs-hadoop/
RUN cp /opt/flink/opt/flink-s3-fs-hadoop-*.jar /opt/flink/plugins/s3-fs-hadoop/ && chown -R flink: /opt/flink/plugins/s3-fs-hadoop/

liad5h avatar Mar 23 '22 16:03 liad5h