cloudflow icon indicating copy to clipboard operation
cloudflow copied to clipboard

Document how to configure RocksDB as Flink state backend

Open yuchaoran2011 opened this issue 5 years ago • 4 comments

Is your feature request related to a problem? Please describe. From Oleg Myagkov @OBenner on Gitter: "Hi! I use RocksDB stateBackend, which storage in hdfs . I have error - Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies. Is it necessary to extend the docker image of cloudflow with the necessary libraries?"

Is your feature request related to a specific runtime of cloudflow or applicable for all runtimes? Only applies to Flink runtime

Describe the solution you'd like In the legacy single-image setup, Hadoop libraries are excluded from Flink classpath in config.sh script to avoid conflict with Spark. In the new multi-image setup, this shouldn't be necessary.

yuchaoran2011 avatar May 14 '20 13:05 yuchaoran2011

This seems similar to allowing for instance azure-blob storage as state backend, is that correct @blublinsky ? Maybe it is now possible since we support flink config in config files?

RayRoestenburg avatar Oct 02 '20 09:10 RayRoestenburg

I think this is as easy as ensuring that your project include the libraries. We tested this approach with Azure blob storage and it works. On another hand Chaoran's proposal of adding them to an image works as well. I would rather use the first approach so that base image does not have additional libraries and they are added only for the ones that require them

blublinsky avatar Oct 02 '20 13:10 blublinsky

I agree, would be good to document how to do this.

RayRoestenburg avatar Oct 02 '20 13:10 RayRoestenburg

Changed issue name

RayRoestenburg avatar Oct 02 '20 13:10 RayRoestenburg