Data Lineage Not Reflected for MSSQL Stored Procedure
Affected module UI and Ingestion Framework
Describe the bug I have a stored procedure on my mssql database. I've created and run a metadata and lineage ingestion on the OM but the lineage between the database affected by the stored procedure does not shows up. The ingestion lineage logs does not show any available lineage to be ingested, however the stored procedure are succesfullly ingested into the OM.
Here are the screenshot of the lineage ingestion debug log:
[2024-06-20T10:23:52.634+0000] {ingestion_pipeline_mixin.py:52} DEBUG - Created Pipeline Status for pipeline XXX-dummy-database.c26baaaa-aa4e-4e1f-b28b-b613d8c5a667: {'id': '89ba7f25-e87e-45e7-9ccc-d9ee65f81168', 'eventType': 'entityUpdated', 'entityType': 'ingestionPipeline', 'entityId': '440ccd9b-9a1e-42e7-a78e-39833aec29e2', 'entityFullyQualifiedName': 'XXX-dummy-database.c26baaaa-aa4e-4e1f-b28b-b613d8c5a667', 'previousVersion': 0.3, 'currentVersion': 0.3, 'userName': 'ingestion-bot', 'timestamp': 1718879032630, 'changeDescription': {'fieldsAdded': [], 'fieldsUpdated': [{'name': 'pipelineStatus', 'oldValue': {'runId': 'bbed7a6e-79d7-40c6-bd02-4bf04c74ef11', 'pipelineState': 'running', 'startDate': 1718879028991, 'timestamp': 1718879028991, 'status': []}, 'newValue': {'runId': 'bbed7a6e-79d7-40c6-bd02-4bf04c74ef11', 'pipelineState': 'success', 'startDate': 1718879028991, 'timestamp': 1718879028991, 'endDate': 1718879032552, 'status': [{'name': 'Mssql', 'records': 0, 'updated_records': 0, 'warnings': 0, 'errors': 0, 'filtered': 0}, {'name': 'OpenMetadata', 'records': 0, 'updated_records': 0, 'warnings': 0, 'errors': 0, 'filtered': 0}]}}], 'fieldsDeleted': [], 'previousVersion': 0.3}, 'entity': {'id': '440ccd9b-9a1e-42e7-a78e-39833aec29e2', 'name': 'c26baaaa-aa4e-4e1f-b28b-b613d8c5a667', 'displayName': 'XXX-dummy-database_lineage_Cs1uNj9z', 'pipelineType': 'lineage', 'fullyQualifiedName': 'XXX-dummy-database.c26baaaa-aa4e-4e1f-b28b-b613d8c5a667', 'sourceConfig': {'config': {'type': 'DatabaseLineage', 'resultLimit': 1000, 'queryLogDuration': 1, 'parsingTimeoutLimit': 300}}, 'airflowConfig': {'pausePipeline': False, 'concurrency': 1, 'startDate': 1718755200000, 'pipelineTimezone': 'UTC', 'retries': 0, 'retryDelay': 300, 'pipelineCatchup': False, 'scheduleInterval': '0 7 * * 1', 'maxActiveRuns': 1, 'workflowDefaultView': 'tree', 'workflowDefaultViewOrientation': 'LR'}, 'service': {'id': 'a14d3a9d-2d37-44b9-951c-7e1774f778f1', 'type': 'databaseService', 'name': 'XXX-dummy-database', 'fullyQualifiedName': 'XXX-dummy-database', 'description': '', 'displayName': 'XXX-dummy-database', 'deleted': False}, 'pipelineStatuses': {'runId': 'bbed7a6e-79d7-40c6-bd02-4bf04c74ef11', 'pipelineState': 'success', 'startDate': 1718879028991, 'timestamp': 1718879028991, 'endDate': 1718879032552, 'status': [{'name': 'Mssql', 'records': 0, 'updated_records': 0, 'warnings': 0, 'errors': 0, 'filtered': 0}, {'name': 'OpenMetadata', 'records': 0, 'updated_records': 0, 'warnings': 0, 'errors': 0, 'filtered': 0}]}, 'loggerLevel': 'DEBUG', 'deployed': True, 'enabled': True, 'href': 'http://openmetadata-server:8585/api/v1/services/ingestionPipelines/440ccd9b-9a1e-42e7-a78e-39833aec29e2', 'version': 0.3, 'updatedAt': 1718874878467, 'updatedBy': 'admin', 'changeDescription': {'fieldsAdded': [], 'fieldsUpdated': [{'name': 'sourceConfig', 'oldValue': '"old-encrypted-value"', 'newValue': '"new-encrypted-value"'}, {'name': 'loggerLevel', 'oldValue': 'INFO', 'newValue': 'DEBUG'}], 'fieldsDeleted': [], 'previousVersion': 0.2}, 'deleted': False, 'provider': 'user'}} [2024-06-20T10:23:52.635+0000] {logger.py:184} INFO - Statuses detailed info: [2024-06-20T10:23:52.635+0000] {logger.py:184} INFO - Mssql Status: [2024-06-20T10:23:52.636+0000] {logger.py:184} INFO - {'failures': [], 'filtered': [], 'records': [], 'source_start_time': 1718879029.196479, 'updated_records': [], 'warnings': []} [2024-06-20T10:23:52.636+0000] {logger.py:184} INFO - OpenMetadata Status: [2024-06-20T10:23:52.636+0000] {logger.py:184} INFO - {'failures': [], 'filtered': [], 'records': [], 'source_start_time': 1718879029.3635197, 'updated_records': [], 'warnings': []} [2024-06-20T10:23:52.636+0000] {logger.py:184} INFO - Execution Time Summary [2024-06-20T10:23:52.639+0000] {logger.py:184} INFO - [2024-06-20T10:23:52.639+0000] {logger.py:184} INFO - Workflow Mssql Summary: [2024-06-20T10:23:52.639+0000] {logger.py:184} INFO - Processed records: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Updated records: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Warnings: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Errors: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Workflow OpenMetadata Summary: [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Processed records: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Updated records: 0 [2024-06-20T10:23:52.640+0000] {logger.py:184} INFO - Warnings: 0 [2024-06-20T10:23:52.641+0000] {logger.py:184} INFO - Errors: 0 [2024-06-20T10:23:52.641+0000] {logger.py:184} INFO - Success %: 100.0 [2024-06-20T10:23:52.641+0000] {logger.py:184} INFO - Workflow finished in time: 3.45s [2024-06-20T10:23:52.642+0000] {python.py:194} INFO - Done. Returned value was: None [2024-06-20T10:23:52.652+0000] {taskinstance.py:1400} INFO - Marking task as SUCCESS. dag_id=c26baaaa-aa4e-4e1f-b28b-b613d8c5a667, task_id=lineage_task, execution_date=20240620T102347, start_date=20240620T102348, end_date=20240620T102352 [2024-06-20T10:23:52.700+0000] {local_task_job_runner.py:228} INFO - Task exited with return code 0 [2024-06-20T10:23:52.721+0000] {taskinstance.py:2778} INFO - 0 downstream tasks scheduled from follow-on schedule check
To Reproduce
- Create a new database service with MSSQL connector
- Create a metadata ingestion
- Create a lineage ingestion
- The stored procedure from the MSSQL will be successfully ingested
- The data lineage from the MSSQL will not show up anywhere.
Expected behavior There should be a data lineage from table A to stored procedure X to table B.
Version:
- OS: Debian GNU/Linux 11 (bullseye)
- Python version: I use docker.
- OpenMetadata version: 1.4.1, then upgraded to 1.4.3. Same issue.
- OpenMetadata Ingestion package version: 1.4.1 then upgraded to 1.4.3
Additional context Add any other context about the problem here.
@ulixius9 seems like this is related to:
https://github.com/open-metadata/OpenMetadata/issues/16424 https://github.com/open-metadata/OpenMetadata/pull/16876
@ulixius9 would it be possible to fix the [] (square brackets) parsing issue as advised by Teddy in #16876 ?
It seems like you have the best understanding of the cause for this one: https://github.com/open-metadata/OpenMetadata/issues/7427#issuecomment-2129407993
We had faced the same problem. Also we found out that commenting commands like "CREATE PROCEDURE", "BEGIN", "END", "TRUNCATE", "DECLARE", "AS" gives correct lineage using SQL LINEAGE tool (which OM use for parsing lineages as i got). Screenshots of the example stored procedure below.
https://reata.github.io/sqllineage/ - sql lineage web demo
RabochiyYan
How will commenting out the script of stored procedures address the issue of OpenMetadata lineage not working for stored procedures? Stored procedures are built once and saved. so how does this approach help with capturing the lineage for stored procedures?
@bharathkumar78
How will commenting out the script of stored procedures address the issue of OpenMetadata lineage not working for stored procedures? Stored procedures are built once and saved. so how does this approach help with capturing the lineage for stored procedures?
May be this will help to understand what cause the problem? I do not suggest users to comment on their procedures. I propose to refine the stored procedure parser in such a way that it possibly excludes these lines before building lineage.
@RabochiyYan did you guys manage to find any workaround? I am facing a similar issue..
@RabochiyYan did you guys manage to find any workaround? I am facing a similar issue..
Yes, we tried using chat gpt to parse it. Not a best way to be honest...