[ZEPPELIN-4409] Set spark.app.name to know which user is running which notebook
What is this PR for?
This PR sets spark.app.name to "Zeppelin Notebook xxxxx for user test" which makes it helpful to know which notebooks are currently being used and how much resources are being allocated by which user and notebook.
What type of PR is it?
Feature
What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-4409
How should this be tested?
- Create a notebook with spark interpreter
- Run println("hello")
- spark app name will be set to user name and note id
Screenshots (if appropriate)

Questions:
- Does the licenses files need update? No
- Is there breaking changes for older versions? No
- Does this needs documentation? No
Thanks for the contribution @amakaur
It makes sense to include user name and note id in the spark app name. But one easier approach is to set spark.app.name in SparkInterpreterLauncher, you can set spark.app.name to be interpreterGroupId which is consisted of note id and user name. Here's the logic how interpreterGroupId is generated. https://github.com/apache/zeppelin/blob/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/interpreter/InterpreterSetting.java#L406
Thanks for the suggestion @zjffdu , i tried setting spark.app.name in SparkInterpreterLauncher#buildEnvFromProperties through env variable "SPARK_APP_NAME" and via spark property "spark.app.name", but didn't seem to work. It sets spark.app.name to default value "Zeppelin".
@amakaur How do you set that ? Can you show me the code ?
@zjffdu here's two ways that i tried: - by either setting SPARK_APP_NAME OR by setting spark property spark.app.name
public Map<String, String> buildEnvFromProperties(InterpreterLaunchContext context) throws IOException {
Map<String, String> env = super.buildEnvFromProperties(context);
Properties sparkProperties = new Properties();
String sparkMaster = getSparkMaster(properties);
for (String key : properties.stringPropertyNames()) {
if (RemoteInterpreterUtils.isEnvString(key)) {
env.put(key, properties.getProperty(key));
}
if (isSparkConf(key, properties.getProperty(key))) {
sparkProperties.setProperty(key, toShellFormat(properties.getProperty(key)));
}
}
env.put("SPARK_APP_NAME", context.getInterpreterGroupId() + "_testing");
//sparkProperties.setProperty("spark.app.name", context.getInterpreterGroupId() + "_testing");
setupPropertiesForPySpark(sparkProperties);
setupPropertiesForSparkR(sparkProperties);
if (isYarnMode() && getDeployMode().equals("cluster")) {
env.put("ZEPPELIN_SPARK_YARN_CLUSTER", "true");
sparkProperties.setProperty("spark.yarn.submit.waitAppCompletion", "false");
}
StringBuilder sparkConfBuilder = new StringBuilder();
if (sparkMaster != null) {
sparkConfBuilder.append(" --master " + sparkMaster);
}
if (isYarnMode() && getDeployMode().equals("cluster")) {
if (sparkProperties.containsKey("spark.files")) {
sparkProperties.put("spark.files", sparkProperties.getProperty("spark.files") + "," +
zConf.getConfDir() + "/log4j_yarn_cluster.properties");
} else {
sparkProperties.put("spark.files", zConf.getConfDir() + "/log4j_yarn_cluster.properties");
}
sparkProperties.put("spark.yarn.maxAppAttempts", "1");
}```
@amakaur SPARK_APP_NAME is not valid, you need to use spark.app.name which is what spark expect ( refer here )
I tried that, and it works for me

@zjffdu Thanks for trying it out. Also, let me try again w/ spark.app.name, but i had spark.app.name commented out in the code that i posted just to show two different ways i was using to get it to work https://github.com/apache/zeppelin/blob/master/spark/interpreter/src/main/resources/interpreter-setting.json#L30
public Map<String, String> buildEnvFromProperties(InterpreterLaunchContext context) throws IOException { Map<String, String> env = super.buildEnvFromProperties(context); Properties sparkProperties = new Properties(); String sparkMaster = getSparkMaster(properties); for (String key : properties.stringPropertyNames()) { if (RemoteInterpreterUtils.isEnvString(key)) { env.put(key, properties.getProperty(key)); } if (isSparkConf(key, properties.getProperty(key))) { sparkProperties.setProperty(key, toShellFormat(properties.getProperty(key))); } }
//env.put("SPARK_APP_NAME", context.getInterpreterGroupId() + "_testing");
sparkProperties.setProperty("spark.app.name", context.getInterpreterGroupId() + "_testing");
setupPropertiesForPySpark(sparkProperties);
setupPropertiesForSparkR(sparkProperties);
if (isYarnMode() && getDeployMode().equals("cluster")) {
env.put("ZEPPELIN_SPARK_YARN_CLUSTER", "true");
sparkProperties.setProperty("spark.yarn.submit.waitAppCompletion", "false");
}
StringBuilder sparkConfBuilder = new StringBuilder();
if (sparkMaster != null) {
sparkConfBuilder.append(" --master " + sparkMaster);
}
if (isYarnMode() && getDeployMode().equals("cluster")) {
if (sparkProperties.containsKey("spark.files")) {
sparkProperties.put("spark.files", sparkProperties.getProperty("spark.files") + "," +
zConf.getConfDir() + "/log4j_yarn_cluster.properties");
} else {
sparkProperties.put("spark.files", zConf.getConfDir() + "/log4j_yarn_cluster.properties");
}
sparkProperties.put("spark.yarn.maxAppAttempts", "1");
}```
Still not working for me, but will debug to see if i can find something.
public Map<String, String> buildEnvFromProperties(InterpreterLaunchContext context) throws IOException {
Map<String, String> env = super.buildEnvFromProperties(context);
Properties sparkProperties = new Properties();
String sparkMaster = getSparkMaster(properties);
for (String key : properties.stringPropertyNames()) {
if (RemoteInterpreterUtils.isEnvString(key)) {
env.put(key, properties.getProperty(key));
}
if (isSparkConf(key, properties.getProperty(key))) {
sparkProperties.setProperty(key, toShellFormat(properties.getProperty(key)));
}
}
sparkProperties.setProperty("spark.app.name", context.getInterpreterGroupId());
setupPropertiesForPySpark(sparkProperties);
setupPropertiesForSparkR(sparkProperties);
if (isYarnMode() && getDeployMode().equals("cluster")) {
env.put("ZEPPELIN_SPARK_YARN_CLUSTER", "true");
sparkProperties.setProperty("spark.yarn.submit.waitAppCompletion", "false");
}
StringBuilder sparkConfBuilder = new StringBuilder();
if (sparkMaster != null) {
sparkConfBuilder.append(" --master " + sparkMaster);
}
if (isYarnMode() && getDeployMode().equals("cluster")) {
if (sparkProperties.containsKey("spark.files")) {
sparkProperties.put("spark.files", sparkProperties.getProperty("spark.files") + "," +
zConf.getConfDir() + "/log4j_yarn_cluster.properties");
} else {
sparkProperties.put("spark.files", zConf.getConfDir() + "/log4j_yarn_cluster.properties");
}
sparkProperties.put("spark.yarn.maxAppAttempts", "1");
}```
Try this branch which works for me. https://github.com/zjffdu/zeppelin/commit/47a50c0ae73c9fc8c5441572cee0d9164106a60f
And set spark.app.name to be empty in interpreter setting page first.
Sure, let me try.
After debugging i realized that the reason my changes weren't working was because spark.app.name was getting overridden by the default value set in interpreter-setting.
i did checkout your branch and got the following result.

Awesome, would you update this PR and add unit test ?
BTW, is the above UI some kind of interpreter process monitoring page ? It is pretty useful for zeppelin, do you have plan to contribute to zeppelin ?
Yup will do once i figure out why user name is spark instead of an actual user name such as amandeep.kaur.
I'm running Zeppelin on Mesos mode. The above is Mesos UI. Pretty useful.
@zjffdu after debugging a bit this new approach might not work for me because user is set as spark instead of LDAP user.
@amakaur The user should be the zeppelin login user name. Do you enable LDAP in zeppelin ?
Yeah LDAP is enabled. It works as expected w/ original solution.
Oh, you might forget one thing, you need to set spark interpreter as user isolated.
I've it set as per note in isolated process
Still doesn't work for you in per user isolated mode ? What do you see in the spark app name ?
spark.app.name is set as spark-2ESRB4NZ1 .
The following is an ex from the logs
Create Session: shared_session in InterpreterGroup: spark-2ENBYUKC2 for user: amandeep.kaur where user is LDAP user BUT interpreterGroupId contains spark as a user value
I would like to set spark.app.name to amandeep.kaur-2ENBYUKC2.
Ah, you are using per note isolated, you should use per user isolated in which case the interpreterGroupId will include user name.
I'm not sure if i'll be able to use per user isolated for zeppelin in prod because we would like each note to have its own JVM. Would using per used isolated create a single shared interpreter process for all of the notebooks that are under same user?
In that case, I think you can use per user isolated & per note isolated. Then the spark name would be interpreter_name + user_name + noteId
Try this branch which works for me. zjffdu@47a50c0
And set spark.app.name to be empty in interpreter setting page first.
@zjffdu Is this branch enough to achieve the goal ?
@iamabug I don't think so, check my comments above https://github.com/apache/zeppelin/pull/3498#issuecomment-548179978. I believe the right approach is to update SparkInterpreterLauncher.
@zjffdu I believe the commit I mentioned is updating SparkInterpreterLauncher:
if (!sparkProperties.containsKey("spark.app.name") ||
StringUtils.isBlank(sparkProperties.getProperty("spark.app.name"))) {
sparkProperties.setProperty("spark.app.name", context.getInterpreterGroupId());
}
here spark.app.name is set to interpreterGroupId which is generated as we want, am I right ?
Yes, maybe it is better to add prefix Zeppelin_
And this is specific to Spark interpreter, I am not sure whether we need to consider other interpreters, such as Flink.
If flink has such configuration, we could do it as well. But it could be done in another PR.
If spark.app.name is already set, should we leave it as it is or add username and note id also ? if we leave it, all users and notes shall still have the same name, i don't think that's what we want to see. So I think the logic here could be like this:
- if
spark.app.nameis not set or set to blank, we set it toZeppelin_+interpreter_name+username+noteId, andinterpreter_nameis actuallyspark, so it isZeppelin_spark_+username+noteId. - if
spark.app.nameis already set to a string, noted byAPP_NAME, in this case, we set it toAPP_NAME+username+noteId. @zjffdu