marquez icon indicating copy to clipboard operation
marquez copied to clipboard

404 returned when querying nodes that contain URL Encoded characters

Open LyndonArmitage opened this issue 4 years ago • 2 comments

With an event similar to the attached scheme 3 namespaces are created. 2 of these are Dataset namespaces and 1 is a job namespace.

When trying to access one of these datasets a 404 is encountered

URL: http://localhost:3000/api/v1/lineage/?nodeId=dataset:jdbc%3Ah2%3Amem%3Asql_tests_like:HBMOFA.ORDDETP

{
  "eventType": "COMPLETE",
  "eventTime": 1636646662.687894,
  "run": {
    "runId": "d3968e5f-84ac-48c1-954c-f999ff27ef3a",
    "facets": null
  },
  "job": {
    "namespace": "sql-runner-dev",
    "name": "ORDDETP - ORDDETP.avro",
    "facets": {
      "sourceCodeLocation": null,
      "sql": {
        "_producer": "lyndon-thinkpad/127.0.1.1",
        "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/SQLJobFacet.json#/$defs/SQLJobFacet",
        "query": "SELECT t.*, CURRENT_DATE AS ingest_date FROM HBMOFA.ORDDETP t WHERE (BAYY >= 93 AND BAMMDD >= 801 AND BAMMDD < 1111) OR (BAYY = 92 AND BAMMDD >= 1301)"
      },
      "documentation": {
        "_producer": "lyndon-thinkpad/127.0.1.1",
        "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/DocumentationJobFacet.json#/$defs/DocumentationJobFacet",
        "description": "SQL Runner Job for /tmp/sql_runner_tests4560779590026189736/config.conf"
      }
    }
  },
  "inputs": [
    {
      "namespace": "jdbc:h2:mem:sql_tests_like",
      "name": "HBMOFA.ORDDETP",
      "facets": {
        "documentation": null,
        "dataSource": {
          "_producer": "lyndon-thinkpad/127.0.1.1",
          "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet",
          "name": "jdbc:h2:mem:sql_tests_like",
          "uri": "jdbc:h2:mem:sql_tests_like"
        },
        "schema": null
      },
      "inputFacets": null
    }
  ],
  "outputs": [
    {
      "namespace": "s3://sql-runner",
      "name": "2021-11-11/incremental/ORDDETP.avro",
      "facets": {
        "documentation": null,
        "dataSource": {
          "_producer": "lyndon-thinkpad/127.0.1.1",
          "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet",
          "name": "s3://sql-runner",
          "uri": "s3://sql-runner"
        },
        "schema": null
      },
      "outputFacets": null
    }
  ],
  "producer": "lyndon-thinkpad/127.0.1.1",
  "schemaURL": "https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunEvent"
}

A partial stack trace from the docker process looks as follows:

marquez-api  | 172.21.0.4 - - [11/Nov/2021:16:49:56 +0000] "GET /api/v1/namespaces/jdbc%3Ah2%3Amem%3Asql_tests_like/datasets/HBMOFA.ORDDETP/versions?limit=100&offset=0 HTTP/1.1" 200 648 "http://localhost:3000/lineage/dataset/jdbc%3Ah2%3Amem%3Asql_tests_like/HBMOFA.ORDDETP" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" 27
marquez-api  | ERROR [2021-11-11 16:49:59,168] io.dropwizard.jersey.errors.IllegalStateExceptionMapper: Error handling a request: 7b955941cb928966
marquez-api  | ! java.lang.IllegalStateException: No match available
marquez-api  | ! at java.base/java.util.regex.Matcher.start(Unknown Source)
marquez-api  | ! at marquez.service.models.NodeId.parts(NodeId.java:190)
marquez-api  | ! at marquez.service.models.NodeId.asDatasetId(NodeId.java:214)
marquez-api  | ! at marquez.service.LineageService.getJobUuid(LineageService.java:191)
marquez-api  | ! at marquez.service.LineageService.lineage(LineageService.java:40)
marquez-api  | ! at marquez.api.OpenLineageResource.getLineage(OpenLineageResource.java:96)

Note that the original OpenLineage POST request succeeds.

LyndonArmitage avatar Nov 11 '21 16:11 LyndonArmitage

For more detail and the original discussion see the OpenLineage Slack here: https://openlineage.slack.com/archives/C01CK9T7HKR/p1636625611097100

LyndonArmitage avatar Nov 11 '21 16:11 LyndonArmitage

Presumably the issue sits in: src/main/java/marquez/service/models/NodeId.java

Specifically this method

private String[] parts(int expectedParts, String expectedType)

Perhaps there is an issue with the Regex pattern used?

public static final String ID_DELIM = ":"; // line 34
Pattern p = Pattern.compile("(?:" + ID_DELIM + "(?!//|\\d+))"); // line 182
// means the pattern is equal to:
Pattern p = Pattern.compile("(?::(?!//|\\d+))");

I think this means the pattern is doing this: image

LyndonArmitage avatar Nov 12 '21 10:11 LyndonArmitage