cloudflow icon indicating copy to clipboard operation
cloudflow copied to clipboard

Support logging configuration in `kubectl cloudflow deploy` and `kubectl cloudflow configure`

Open michaelpnash opened this issue 5 years ago • 12 comments

Currently it is hard to configure logging for cloudflow streamlets.

You have to specify a logback.xml per sub-project in src/main/resources to get the file on the classpath, which works for akka streamlets.

For Flink streamlets, overriding the default logging does not work at the moment. Log4j and logback is set in task and job managers through:

Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml

And these files are packaged by default in Flink. For Spark, which uses log4j, defaults from a jar file bundled with Spark are used, if log4j.properties are not found on the classpath. So for Spark you need to add a log4j.properties file in the sub-project src/main/resources.

All of this is very cumbersome to configure correctly.

It would be better if logging configuration could be provided through kubectl cloudflow deploy and kubectl cloudflow configure, allowing users to provide log4j and logback files. See https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/753 for an idea on how to get this to work in Spark. The approach will probably work in general:

  • put log config files in config maps
  • mount the config maps on the pods under well-known names.
  • Add java options to pass through log config as system properties.

michaelpnash avatar Apr 23 '20 18:04 michaelpnash

The kubectl cloudflow cli creates a configmap from a command line provided file. For instance --log4j-config <path-to-log4j.properties> and --logback-config <path-to-logback.xml>

This is specified for the entire cloudflow application (to keep things simple?). streamlets that use logback or log4j will be started with those log configurations.

Decisions to make:

  • Should these logfile configmap names be added to the CR? (like secretName in StreamletDeployment, an optional map of logframeworkname -> configmap name for instance). This implies cli modifies CR before it applies it to the cluster.
  • Should the operator watch another kind of resource (configmaps labelled as logging config for Cloudflow?)

RayRoestenburg avatar Sep 15 '20 12:09 RayRoestenburg

The operator must then add a volume mount for the configmaps to every pod. And it must somehow set the log4j / logback configuration with -D setting, for Spark, Flink and Akka. This should work by adding these to JAVA_OPTS, but there are some comments (https://issues.apache.org/jira/browse/FLINK-12286) that this will not work, not sure if we can modify the conf dir in flink...

RayRoestenburg avatar Sep 15 '20 12:09 RayRoestenburg

For spark, the approach of java opts -D, and configmap, seems to work: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/395

RayRoestenburg avatar Sep 15 '20 12:09 RayRoestenburg

to recap:

The problem. Now we know we can pass log4j2.xml to Flink, mainly is all about UX when configuring. So, we currently have multiple log.properties to rule each type of Streamlet. This is not intuitive and adds complexity to the usage. The other issue is that logging configuration is included in the build so it can't be changed without rebuilding the artifact.

These two are the problems and I'm adding now what I think is the desired UX.

In order of importance: 1 - as a user I'd like to be able to change the logging conf without rebuilding the project. 1.1. as a user I'd like to change the logging through the kubectl cloudflow, with deploy and configure 2. As a user I'd like to configure all Streamlets with the same implementation of slf4j.

The question I have now. Can I tackle these two, one by one?

franciscolopezsancho avatar Oct 09 '20 08:10 franciscolopezsancho

Question. If a specific streamlet already has log.properties changing the logging through kubectl cloudflow will not affect it?

The first approach as suggested above will be, adding the log.properties as a config map and adding to the streamlet through JAVA_OPTS when running the jar in the container that contains the streamlet

franciscolopezsancho avatar Oct 09 '20 08:10 franciscolopezsancho

@franciscolopezsancho yes you can first tackle being able to change logging through the kubectl cloudflow, with deploy and configure command, even if it only supports a subset of streaming engines. It's ok to do that iteratively.

point 2 might not work with Spark's dependency on log4j

RayRoestenburg avatar Oct 09 '20 10:10 RayRoestenburg

@franciscolopezsancho Re: your question, see what happens if you specify -Dlog4j.configuration log1.properties -Dlog4j.configuration log2.properties to a jvm, does 1 or 2 get applied? If it is 2, we could add another -D in java_opts, depending how entrypoint is set. Of course, this is exactly what has to figured out. How can we modify the config with the constraints of the current runtimes and how they configure themselves.

Also, we should prefer logback. It's the default/native slf4j implementation. It's easy to use with Akka. It seems Flink can be configured purely with logback as well. (Likely for Spark it is not possible, it depends on log4j, so then we would have a special case for that). this would allow something like kubectl cloudflow deploy foo.json --logback-config logback.xml so that the same configuration is used for all streamlets in the app that use logback. And for instance kubectl cloudflow deploy foo.json --logback-config logback.xml --log4j-confg log4j.properties to configure logging for all streamlets, including Spark streamlets.

RayRoestenburg avatar Oct 09 '20 10:10 RayRoestenburg

:) niice. Good idea. Easy to test that. (regards first point)

About the second, agreed.

franciscolopezsancho avatar Oct 09 '20 10:10 franciscolopezsancho

above:

1 - as a user I'd like to be able to change the logging conf without rebuilding the project.
1.1. as a user I'd like to change the logging through the kubectl cloudflow, with deploy and configure 

will be followed in #776

franciscolopezsancho avatar Oct 09 '20 10:10 franciscolopezsancho

this has been implemented, can we close?

andreaTP avatar Dec 18 '20 15:12 andreaTP

It would be nice if the --logback-config CLI option was mentioned somewhere in the docs. It was really hard for me to find out how to configure logging for streamlets. Reading the docs didn't help. Then I found that the info in this ticket by mere luck.

vkorenev avatar Jan 27 '21 03:01 vkorenev

Thanks for the feedback @vkorenev ! The problem has been fixed here: https://github.com/lightbend/cloudflow/commit/42a96fb39d5dbc374ade14693db5ca900ae94063 but is not yet available in the current documentation.

andreaTP avatar Jan 27 '21 10:01 andreaTP