404 returned when querying nodes that contain URL Encoded characters
With an event similar to the attached scheme 3 namespaces are created. 2 of these are Dataset namespaces and 1 is a job namespace.
When trying to access one of these datasets a 404 is encountered
URL: http://localhost:3000/api/v1/lineage/?nodeId=dataset:jdbc%3Ah2%3Amem%3Asql_tests_like:HBMOFA.ORDDETP
{
"eventType": "COMPLETE",
"eventTime": 1636646662.687894,
"run": {
"runId": "d3968e5f-84ac-48c1-954c-f999ff27ef3a",
"facets": null
},
"job": {
"namespace": "sql-runner-dev",
"name": "ORDDETP - ORDDETP.avro",
"facets": {
"sourceCodeLocation": null,
"sql": {
"_producer": "lyndon-thinkpad/127.0.1.1",
"_schemaURL": "https://openlineage.io/spec/facets/1-0-0/SQLJobFacet.json#/$defs/SQLJobFacet",
"query": "SELECT t.*, CURRENT_DATE AS ingest_date FROM HBMOFA.ORDDETP t WHERE (BAYY >= 93 AND BAMMDD >= 801 AND BAMMDD < 1111) OR (BAYY = 92 AND BAMMDD >= 1301)"
},
"documentation": {
"_producer": "lyndon-thinkpad/127.0.1.1",
"_schemaURL": "https://openlineage.io/spec/facets/1-0-0/DocumentationJobFacet.json#/$defs/DocumentationJobFacet",
"description": "SQL Runner Job for /tmp/sql_runner_tests4560779590026189736/config.conf"
}
}
},
"inputs": [
{
"namespace": "jdbc:h2:mem:sql_tests_like",
"name": "HBMOFA.ORDDETP",
"facets": {
"documentation": null,
"dataSource": {
"_producer": "lyndon-thinkpad/127.0.1.1",
"_schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet",
"name": "jdbc:h2:mem:sql_tests_like",
"uri": "jdbc:h2:mem:sql_tests_like"
},
"schema": null
},
"inputFacets": null
}
],
"outputs": [
{
"namespace": "s3://sql-runner",
"name": "2021-11-11/incremental/ORDDETP.avro",
"facets": {
"documentation": null,
"dataSource": {
"_producer": "lyndon-thinkpad/127.0.1.1",
"_schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet",
"name": "s3://sql-runner",
"uri": "s3://sql-runner"
},
"schema": null
},
"outputFacets": null
}
],
"producer": "lyndon-thinkpad/127.0.1.1",
"schemaURL": "https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunEvent"
}
A partial stack trace from the docker process looks as follows:
marquez-api | 172.21.0.4 - - [11/Nov/2021:16:49:56 +0000] "GET /api/v1/namespaces/jdbc%3Ah2%3Amem%3Asql_tests_like/datasets/HBMOFA.ORDDETP/versions?limit=100&offset=0 HTTP/1.1" 200 648 "http://localhost:3000/lineage/dataset/jdbc%3Ah2%3Amem%3Asql_tests_like/HBMOFA.ORDDETP" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" 27
marquez-api | ERROR [2021-11-11 16:49:59,168] io.dropwizard.jersey.errors.IllegalStateExceptionMapper: Error handling a request: 7b955941cb928966
marquez-api | ! java.lang.IllegalStateException: No match available
marquez-api | ! at java.base/java.util.regex.Matcher.start(Unknown Source)
marquez-api | ! at marquez.service.models.NodeId.parts(NodeId.java:190)
marquez-api | ! at marquez.service.models.NodeId.asDatasetId(NodeId.java:214)
marquez-api | ! at marquez.service.LineageService.getJobUuid(LineageService.java:191)
marquez-api | ! at marquez.service.LineageService.lineage(LineageService.java:40)
marquez-api | ! at marquez.api.OpenLineageResource.getLineage(OpenLineageResource.java:96)
Note that the original OpenLineage POST request succeeds.
For more detail and the original discussion see the OpenLineage Slack here: https://openlineage.slack.com/archives/C01CK9T7HKR/p1636625611097100
Presumably the issue sits in: src/main/java/marquez/service/models/NodeId.java
Specifically this method
private String[] parts(int expectedParts, String expectedType)
Perhaps there is an issue with the Regex pattern used?
public static final String ID_DELIM = ":"; // line 34
Pattern p = Pattern.compile("(?:" + ID_DELIM + "(?!//|\\d+))"); // line 182
// means the pattern is equal to:
Pattern p = Pattern.compile("(?::(?!//|\\d+))");
I think this means the pattern is doing this:
