zeppelin icon indicating copy to clipboard operation
zeppelin copied to clipboard

[ZEPPELIN-3438] Passing Z variables to BigQuery

Open sanjaydasgupta opened this issue 7 years ago • 11 comments

What is this PR for?

This PR enables the interpolation of ZeppelinContext objects into the paragraph text of BigQuery cells. It also introduces a new interpreter-level configuration parameter named zeppelin.bigquery.interpolation. This new parameter is false by default, and must be set to true to enable object interpolation. The default value of false guarantees backward compatibility for users who are not aware of the new feature.

The implementation takes the same approach that was followed in PR 2898 and 2903.

What type of PR is it?

[ Feature ]

Todos

  • [ ] - Task

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-3438

How should this be tested?

Two new unit test classes have been added: BigQueryInterpolationDisabled.java, and BigQueryInterpolationEnabled.java.

Also, the code in this PR merely causes the BigQuery interpreter to "opt-in" to the implementation of z-variable interpolation already existing in the Interpreter base class - described in PR-2898. The unit tests necessary for testing z-variable interpolation in detail are already present in PR-2898

Screenshots (if appropriate)

Questions:

  • Does the licenses files need update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? Yes, documentation changes provided in bigquery.md

sanjaydasgupta avatar Jul 13 '18 06:07 sanjaydasgupta

LGTM

jongyoul avatar Jul 13 '18 09:07 jongyoul

Thanks for pointing out the issue @felixcheung

I've corrected the link as well as some of the preceding words as the ZeppelinContext documentation is now in its own page - not under Spark Interpreter. This correction (the words preceding the link) will have to be made in the Spark and JDBC interpreter documentation too, and I will bundle it later with another PR.

I have also made another correction in the configuration table - the default value false has been moved into the middle column (where it should have been) from its previous location at the end of the description in the last column.

sanjaydasgupta avatar Jul 15 '18 15:07 sanjaydasgupta

The two failed tests on travis - apparently unrelated to the code from this PR - are really stubborn and wont go away no matter what I do!

sanjaydasgupta avatar Jul 16 '18 03:07 sanjaydasgupta

@sanjaydasgupta I was thinking would it better to put it in Interpreter base class instead of doing it in every interpreter implementation.

zjffdu avatar Jul 16 '18 07:07 zjffdu

Thanks for the suggestion @zjffdu.

The same idea was given earlier by Leemoonsoo here, and I had then moved all of the required code into the interpolate(...) method of the base class.

So, the PR for each interpreter now contains the following 4 elements:

  1. The configuration changes (defining the zeppelin.???.interpolate parameter) in the associated interpreter-setting.json file (seen here),
  2. The documentation changes in the associated docs/interpreter/???.md file (seen here),
  3. A 2-line change to the interpreter's interpret() method causing it to invoke interpolate(...) in the base class if zeppelin.???.interpolate is found to be true (seen here), and
  4. The specific unit-test classes as applicable (seen here and here)

Keeping this consistent set of files together in one PR helps the review process IMHO, but I am open to any suggestions for further improvement.

sanjaydasgupta avatar Jul 16 '18 14:07 sanjaydasgupta

Thanks @sanjaydasgupta I was thinking whether we could do it in Interpreter.java like this.

  public abstract InterpreterResult interpret2(String st,
                                              InterpreterContext context)
      throws InterpreterException;

  public InterpreterResult interpret(String st, InterpreterContext context)
      throws InterpreterException {
    String cmd = Boolean.parseBoolean(getProperty("zeppelin.shell.interpolation")) ?
        interpolate(st, context.getResourcePool()) : st;
    return interpret2(cmd, context);
  }

So that each sub class can just implement method interpret2 (we can make a more proper name for this method).

zjffdu avatar Jul 17 '18 00:07 zjffdu

This is a great idea @zjffdu.

There are also a few other things to think of - like automatically generating the name of the enabler parameter on a per-interpreter basis (zeppelin.shell.interpolation in your sample code above). This can be done by adding a static map to interpreter.java that translates from interpreter class name to parameter name, but I think we will obtain greater mileage from the following 2-part strategy:

  1. refactor the interpret(...) method in all of the existing interpreters once to introduce the 2 lines that have to change in each interpreter (no need to introduce a new method like interpret2(...)
  2. add a map to Interpreter.java that translates from interpreter class name to parameter name, for use in the updated code in (1) above. This map could be filled with unused names in advance without doing any harm.

This is more or less what I had in mind during this earlier discussion, but I did not want to start it then as there was other interpreter refactoring work going on. Perhaps this is a better time to resume that thought :-) Allow me to ask the opinions of a wider group of reviewers - what do you think @felixcheung @jongyoul @zjffdu ?

sanjaydasgupta avatar Jul 17 '18 03:07 sanjaydasgupta

do we really need this for all interpreters though? for those we don't have active maintainer it might better not to change them

felixcheung avatar Jul 18 '18 04:07 felixcheung

We won't enable it for all interpreters. By default we could disable it, interpreter can enable it via setting zeppelin.interpreter.interpolation to true to enable it. The purpose is just to provide hook to allow interpreter have chance to update code before doing real interpretation.

zjffdu avatar Jul 18 '18 04:07 zjffdu

Hi this PR have never been merged. So simple question is variable interpolation for bigquery interpreter is working or not at all ? trying to use z.put(var,val) then use in %%bigquery block but no interpolation the string the placeholder is not replaced Thanks

richiesgr avatar Nov 15 '20 13:11 richiesgr

@richiesgr Yes, I don't think interpolation is working for bigquery interpreter right now. The easy way is just to make BigQuery extends AbstractInterpreter. so that user can enable interpolation via local properties, e.g.

%bigquery(interpolate=true)

...

https://github.com/apache/zeppelin/blob/master/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/AbstractInterpreter.java#L42

zjffdu avatar Nov 15 '20 14:11 zjffdu