druid icon indicating copy to clipboard operation
druid copied to clipboard

Fix the druid json_object issue

Open AlbericByte opened this issue 1 year ago • 6 comments

Fixes #16356.

Description

the dependence calcite cr need to be review the root case is that the operator is SqlFunction, but in the map, the key instance is SqlJsonObjectFunction, the type is not equals, so we can naver get from the map in the following code:

 public @Nullable RexCallImplementor get(final SqlOperator operator) {
    if (operator instanceof SqlUserDefinedFunction) {
      org.apache.calcite.schema.Function udf =
          ((SqlUserDefinedFunction) operator).getFunction();
      if (!(udf instanceof ImplementableFunction)) {
        throw new IllegalStateException("User defined function " + operator
            + " must implement ImplementableFunction");
      }
      CallImplementor implementor =
          ((ImplementableFunction) udf).getImplementor();
      return wrapAsRexCallImplementor(implementor);
    } else if (operator instanceof SqlTypeConstructorFunction) {
      return map.get(SqlStdOperatorTable.ROW);
    }
    return map.get(operator);
  }

Fixed the bug ...

Release note

Fix the SQL JSON_OBJECT() function results in RUNTIME_FAILURE when querying INFORMATION_SCHEMA.COLUMNS


Key changed/added classes in this PR
  • NestedDataOperatorConversions.java

This PR has:

  • [x] been self-reviewed.
    • [ ] using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
  • [ ] added documentation for new or modified features or behaviors.
  • [ ] a release note entry in the PR description.
  • [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • [ ] added or updated version, license, or notice information in licenses.yaml
  • [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • [ ] added integration tests.
  • [ ] been tested in a test Druid cluster.

AlbericByte avatar Jun 12 '24 23:06 AlbericByte

So to resolve this problem, we need to update the calcite to a release that includes your fix?

FrankChen021 avatar Jun 14 '24 01:06 FrankChen021

can we add a test case?

FrankChen021 avatar Jun 14 '24 01:06 FrankChen021

@FrankChen021, @kgyrtkirk and @cryptoe i am creating the test case in calcite first, still working on this. After, i will create the test case in druid

AlbericByte avatar Jun 23 '24 01:06 AlbericByte

So you want all functions to work with equalsIgnoreCase ?

select fruit, sum(price) from foo group by 1 = select fruit, sUm(price) from foo group by 1 ?

cryptoe avatar Jun 24 '24 11:06 cryptoe

I wonder why can't we simply use the SqlStdOperatorTable.JSON_OBJECT instead of declaring a new sqlfunction in NestedDataOperatorConversions.

  public static class JsonObjectOperatorConversion implements SqlOperatorConversion
  {
    private static final SqlFunction SQL_FUNCTION = SqlStdOperatorTable.JSON_OBJECT;
  [...]

I've done a quick check with a testcase in CalciteSysQueryTest

  @Test
  public void testJsonObject()
  {
    testBuilder()
        .sql(
            "select JSON_OBJECT('name': COLUMN_NAME, 'type': \"DATA_TYPE\")"
                + "from INFORMATION_SCHEMA.COLUMNS order by COLUMN_NAME limit 1"
        )
        .expectedResults(
            ImmutableList.of(
                new Object[] {"{\"name\":\"CATALOG_NAME\",\"type\":\"VARCHAR\"}"}
            )
        )
        .run();
  }

I'm not sure if other tests will pass with this change...but SqlToRel was using a direct comparision check on the builtin operator; we have a few of those for the JSON_ functions...maybe the best would be to use the SqlStdOperator versions for the ones SqlToRel does special handling?

...other way around could be to somehow get rid of the Bindables mode - so that this query gets executed the same way as regular Druid queries

kgyrtkirk avatar Jun 25 '24 10:06 kgyrtkirk

Bindables mode hI @kgyrtkirk Thanks for your suggestion, I have switch to SqlStdOperatorTable.JSON_OBJECT

May i know what is your idea of SqlStdOperator versions solution?

AlbericByte avatar Jun 27 '24 01:06 AlbericByte

@kgyrtkirk i have try to add some calcite rule for plan, but it still need to get sqlfunction from rextableimp, i do not have any other work around now, i am open if someone can provide idea or continue to work on this, i can monitor and learn.

AlbericByte avatar Jul 05 '24 08:07 AlbericByte

I think that getting rid of the bindables mode might just make this and possibly other issues go away...

I was looking around a bit and I think you will get similar issues for all the functions inside NestedDataOperatorConversions

to address that I think the following might be considered:

  • fix these SqlFunction-s in NestedDataOperatorConversions ; to be usable bindable functions
  • make the bindables go away
    • the reason these are around is due to the fact that some tables are only provided in a way the bindable wants them here
    • fixing that would be to make sure that all tables can be unwrapped to DruidTable
    • easiest option would be to provide them as an InlineTable ; however that would mean all those values will appear in the plans - so a new subclass of DruidTable would be needed - which could access information_schema stuff

I would preffer (2) as I have a feeling that (1) might be quite tricky.... Removing bindables mode have further benefits; it will make those queries work the same as regular tables with joins/etc; and it would reduce the complexity a bit.

I'll try to think about further alternatives/ideas/etc

cc: @gianm

kgyrtkirk avatar Jul 08 '24 08:07 kgyrtkirk

@kgyrtkirk thanks let me know your solution, i will try this week.

AlbericByte avatar Jul 09 '24 03:07 AlbericByte