sql icon indicating copy to clipboard operation
sql copied to clipboard

[BUG] SQL query doesn't honor date format in OpenSearch index mapping

Open dai-chen opened this issue 3 years ago • 4 comments

What is the bug? It seems SQL query engine doesn't honor what's configured in OpenSearch index mapping for date field. This causes problems in different queries with datetime field involved. See examples below.

How can one reproduce the bug?

As documented, "strict_date_optional_time||epoch_millis" is the default format if not specified in index mapping. The issue happens when custom date format is configured as below. Note that this is mostly due to the gaps between engine v2 and the legacy (which may not have these issues at all).

Issue 1: Datetime literal parsing problem

With epoch_millis format removed in mapping, the previous work query throws exception now. From the error message, it seems caused by epoch timestamp used in DSL translated rather than the only strict_date_optional_time configured. Note that OpenSearch doesn't complain this in any syntax/semantic check (probably due to missing semantic check), but throw exception at execution time instead.

PUT my-index-000002
{
  "mappings": {
    "properties": {
      "date": {
        "type": "date",
        "format": "strict_date_optional_time"
      }
    }
  }
}

PUT my-index-000002/_doc/3
{ "date": "2015-01-01T12:10:30Z" }

POST _plugins/_sql
{
  "query": "SELECT * FROM my-index-000002 WHERE `date` < '2022-08-20 23:59:59.999' "
}

{
  "error": {
    "type": "SearchPhaseExecutionException",
    "reason": "Error occurred in OpenSearch engine: all shards failed",
    "details": "Shard[0]: OpenSearchParseException[failed to parse date field [1661039999999] with format [strict_date_optional_time]: [failed to parse date field [1661039999999] with format [strict_date_optional_time]]]; nested: IllegalArgumentException[failed to parse date field [1661039999999] with format [strict_date_optional_time]]; nested: NotSerializableExceptionWrapper[date_time_parse_exception: Text '1661039999999' could not be parsed at index 0];\n\nFor more details, please send request for Json format to see the raw response from OpenSearch engine."
  },
  "status": 503
}

Issue 2: Datetime value parsing problem

Related: https://github.com/opensearch-project/sql/issues/126, https://github.com/opendistro-for-elasticsearch/sql/issues/1062

No matter what date format configured, OpenSearchExprValueFactory always uses the hardcoding formatter in https://github.com/opensearch-project/sql/blob/b0ef5e0299ccbd48cf5b4bc5f68401d2116aef50/opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java#L86. This causes date value parsing error or wrong timezone.

What is the expected behavior? OpenSearch SQL/PPL should honor the date format in index mapping and parse date value from OpenSearch or date literals in query accordingly.

Currently only data type is returned and associated with field. One approach to improve this is reading datetime format from OpenSearch along with basic field type info. Code: https://github.com/opensearch-project/sql/blob/b0ef5e0299ccbd48cf5b4bc5f68401d2116aef50/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/OpenSearchIndex.java#L60

What is your host/environment?

  • OpenSearch 2.2
  • Plugins: SQL

Do you have any screenshots? N/A

Do you have any additional context? Similar issues may apply to PPL as well due to single core engine shared across languages.

dai-chen avatar Aug 30 '22 22:08 dai-chen

Isn't a duplicate/related to #126?

Yury-Fridlyand avatar Aug 31 '22 18:08 Yury-Fridlyand

Isn't a duplicate/related to #126?

I added it to Issue 2. Issue 1 is slightly different. I think both has same root cause.

dai-chen avatar Aug 31 '22 18:08 dai-chen

Mapping:

{
    "mappings" : {
        "properties" : {
            "key" : {
                "type" : "keyword"
            },
            "val" : {
                "type" : "date",
                "format": "time_no_millis"
            }
        }
    }
}

Data

{ "index" : { "_id" : "1" } }
{"key": "null", "val": null}
{ "index" : { "_id" : "2" } }
{"key": "001: 00:00:00", "val": "08:00:00Z"}
{ "index" : { "_id" : "3" } }
{"key": "002: 00:00:01", "val": "08:00:01Z"}
{ "index" : { "_id" : "4" } }
{"key": "003: 01:00:00", "val": "09:00:00Z"}

Exception stack:

Error happened during query handling
java.lang.IllegalStateException: Construct ExprTimestampValue from "08:00:00Z" failed, unsupported date format.
        at org.opensearch.sql.opensearch.data.value.OpenSearchExprValueFactory.constructTimestamp(OpenSearchExprValueFactory.java:185) ~[opensearch-2.4.0.0-SNAPSHOT.jar:?]
        at org.opensearch.sql.opensearch.data.value.OpenSearchExprValueFactory.parseTimestamp(OpenSearchExprValueFactory.java:195) ~[opensearch-2.4.0.0-SNAPSHOT.jar:?]
        at org.opensearch.sql.opensearch.data.value.OpenSearchExprValueFactory.parse(OpenSearchExprValueFactory.java:154) ~[opensearch-2.4.0.0-SNAPSHOT.jar:?]
        at org.opensearch.sql.opensearch.data.value.OpenSearchExprValueFactory.lambda$parseStruct$16(OpenSearchExprValueFactory.java:204) ~[opensearch-2.4.0.0-SNAPSHOT.jar:?]
        at java.util.Iterator.forEachRemaining(Iterator.java:133) ~[?:?]
        at org.opensearch.sql.opensearch.data.value.OpenSearchExprValueFactory.parseStruct(OpenSearchExprValueFactory.java:203) ~[opensearch-2.4.0.0-SNAPSHOT.jar:?]
        at org.opensearch.sql.opensearch.data.value.OpenSearchExprValueFactory.parse(OpenSearchExprValueFactory.java:149) ~[opensearch-2.4.0.0-SNAPSHOT.jar:?]
        at org.opensearch.sql.opensearch.data.value.OpenSearchExprValueFactory.construct(OpenSearchExprValueFactory.java:122) ~[opensearch-2.4.0.0-SNAPSHOT.jar:?]
        at org.opensearch.sql.opensearch.response.OpenSearchResponse.lambda$iterator$2(OpenSearchResponse.java:97) ~[opensearch-2.4.0.0-SNAPSHOT.jar:?]
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) ~[?:?]
        at java.util.Spliterators$ArraySpliterator.tryAdvance(Spliterators.java:958) ~[?:?]
        at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294) ~[?:?]
        at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206) ~[?:?]
        at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169) ~[?:?]
        at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300) ~[?:?]
        at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681) ~[?:?]
        at org.opensearch.sql.opensearch.storage.OpenSearchIndexScan.hasNext(OpenSearchIndexScan.java:90) ~[opensearch-2.4.0.0-SNAPSHOT.jar:?]
        at org.opensearch.sql.opensearch.executor.protector.ResourceMonitorPlan.hasNext(ResourceMonitorPlan.java:74) ~[opensearch-2.4.0.0-SNAPSHOT.jar:?]
        at org.opensearch.sql.planner.physical.ProjectOperator.hasNext(ProjectOperator.java:51) ~[core-2.4.0.0-SNAPSHOT.jar:?]
        at org.opensearch.sql.opensearch.executor.OpenSearchExecutionEngine.lambda$execute$0(OpenSearchExecutionEngine.java:39) [opensearch-2.4.0.0-SNAPSHOT.jar:?]
        at org.opensearch.sql.opensearch.client.OpenSearchNodeClient.schedule(OpenSearchNodeClient.java:157) [opensearch-2.4.0.0-SNAPSHOT.jar:?]
        at org.opensearch.sql.opensearch.executor.OpenSearchExecutionEngine.execute(OpenSearchExecutionEngine.java:33) [opensearch-2.4.0.0-SNAPSHOT.jar:?]
        at org.opensearch.sql.sql.SQLService.execute(SQLService.java:66) [sql-2.4.0.0-SNAPSHOT.jar:?]
        at org.opensearch.sql.legacy.plugin.RestSQLQueryAction.lambda$prepareRequest$1(RestSQLQueryAction.java:123) [legacy-2.4.0.0-SNAPSHOT.jar:?]
        at org.opensearch.sql.legacy.plugin.RestSqlAction.lambda$prepareRequest$1(RestSqlAction.java:162) [legacy-2.4.0.0-SNAPSHOT.jar:?]
        at org.opensearch.sql.opensearch.executor.Scheduler.lambda$withCurrentContext$0(Scheduler.java:30) [opensearch-2.4.0.0-SNAPSHOT.jar:?]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-2.4.0-SNAPSHOT.jar:2.4.0-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.time.format.DateTimeParseException: Text '08:00:00Z' could not be parsed, unparsed text found at index 8
        at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2049) ~[?:?]
        at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1874) ~[?:?]
        at org.opensearch.sql.opensearch.data.value.OpenSearchExprValueFactory.constructTimestamp(OpenSearchExprValueFactory.java:182) ~[opensearch-2.4.0.0-SNAPSHOT.jar:?]
        ... 29 more

Yury-Fridlyand avatar Oct 18 '22 16:10 Yury-Fridlyand

Related issue -- https://github.com/opensearch-project/sql/issues/924. See this comment for root cause.

MaxKsyunz avatar Oct 18 '22 18:10 MaxKsyunz

Reported on forum: https://forum.opensearch.org/t/sql-select-fails-on-date-fields-format-epoch-second/11521/

Yury-Fridlyand avatar Nov 16 '22 21:11 Yury-Fridlyand

@Yury-Fridlyand @MaxKsyunz Here are some examples with which users had problem previously:

Index mapping:

PUT /_ingest/pipeline/test-pipeline
{
    "description" : "test-pipeline",
    "processors" : [
      {
        "date" : {
          "output_format" : "strict_date_optional_time",
          "ignore_failure" : false,
          "field" : "transactionDateTime",
          "target_field" : "transactionDateTime",
          "formats" : [
            "yyyy-MM-dd HH:mm:ss.SSS Z"
          ]
        }
      }
    ]
  }

PUT /_index_template/test
{
  "index_patterns" : [
    "test-*"
  ],
  "template": {
      "aliases": {
        "test":{}
      },
      "mappings": {
          "numeric_detection": true,
          "dynamic_date_formats": [
              "strict_date_optional_time",
              "yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z||yyyy-MM-dd HH:mm:ss.SSS Z||strict_date_optional_time"
          ],
          "properties": {
              "transactionDateTime" : {
              "type" : "date"
            }
          }
      }
  }
}

Data:

POST /test-123/_doc
{
    "transactionDateTime": "2022-11-01 03:00:52.000 +0000"
}

Query:

SELECT * FROM test* where transactionDateTime = '2022-06-21 03:00:52.000'

dai-chen avatar Nov 17 '22 20:11 dai-chen

@Yury-Fridlyand Are we planning to support this in 2.6.0 release once the PoC # 180 done?

dai-chen avatar Jan 17 '23 18:01 dai-chen

@dai-chen, yes

Yury-Fridlyand avatar Jan 17 '23 19:01 Yury-Fridlyand

I am having the same issue as the author after updating to 2.9. Can't do any query with dates

andremacola avatar Aug 31 '23 21:08 andremacola