Ingestion: Performance Improvements to the Ingestion Workflow Epic

Open IceS2 opened this issue 1 year ago • 0 comments

Overview

Work on different fronts in order to improve the Ingestion Performance.

Logging

[x] Log Execution Time on DEBUG

PR - https://github.com/open-metadata/OpenMetadata/pull/15013

Multithreading on the Source side:

[ ] Implement Multithreading for DatabaseServices

Done at Schema level. It shouldn't affect the performance on the corner cases of having less Schemas than threads but it might affect the performance for the case where we have many Schemas with almost no Tables within.

PR - https://github.com/open-metadata/OpenMetadata/pull/15130
[ ] Implement Multithreading for DashboardServices

Incremental Metadata extraction:

[ ] For Snowflake

Done at Table leve using SNOWFLAKE.ACCOUNT_USAGE.TABLES.

PR - https://github.com/open-metadata/OpenMetadata/pull/15201
[ ] For Redshift
[ ] For BigQuery

Batch Process + Sink leaf nodes:

Fire and Forget leaf node requests:

[ ] Implement a "Fire and Forget" way to send the requests to OpenMetadata for leaf nodes. We should collect the results at a later date to create the summary but not block the processing beforehand 🤔

Feb 16 '24 08:02 IceS2