Ingestion: Performance Improvements to the Ingestion Workflow Epic
Overview
Work on different fronts in order to improve the Ingestion Performance.
TODOs
Logging
-
[x] Log Execution Time on DEBUG
PR - https://github.com/open-metadata/OpenMetadata/pull/15013
Multithreading on the Source side:
-
[ ] Implement Multithreading for DatabaseServices
Done at Schema level. It shouldn't affect the performance on the corner cases of having less Schemas than threads but it might affect the performance for the case where we have many Schemas with almost no Tables within.
PR - https://github.com/open-metadata/OpenMetadata/pull/15130
-
[ ] Implement Multithreading for DashboardServices
Incremental Metadata extraction:
-
[ ] For Snowflake
Done at Table leve using SNOWFLAKE.ACCOUNT_USAGE.TABLES.
PR - https://github.com/open-metadata/OpenMetadata/pull/15201
-
[ ] For Redshift
-
[ ] For BigQuery
Batch Process + Sink leaf nodes:
- [ ] For DatabaseServices
- [ ] For DashboardServices
Fire and Forget leaf node requests:
- [ ] Implement a "Fire and Forget" way to send the requests to OpenMetadata for leaf nodes. We should collect the results at a later date to create the summary but not block the processing beforehand 🤔