OpenMetadata icon indicating copy to clipboard operation
OpenMetadata copied to clipboard

Ingestion: Performance Improvements to the Ingestion Workflow Epic

Open IceS2 opened this issue 1 year ago • 0 comments

Overview

Work on different fronts in order to improve the Ingestion Performance.

TODOs

Logging

  • [x] Log Execution Time on DEBUG

    PR - https://github.com/open-metadata/OpenMetadata/pull/15013

Multithreading on the Source side:

  • [ ] Implement Multithreading for DatabaseServices

    Done at Schema level. It shouldn't affect the performance on the corner cases of having less Schemas than threads but it might affect the performance for the case where we have many Schemas with almost no Tables within.

    PR - https://github.com/open-metadata/OpenMetadata/pull/15130

  • [ ] Implement Multithreading for DashboardServices

Incremental Metadata extraction:

  • [ ] For Snowflake

    Done at Table leve using SNOWFLAKE.ACCOUNT_USAGE.TABLES.

    PR - https://github.com/open-metadata/OpenMetadata/pull/15201

  • [ ] For Redshift

  • [ ] For BigQuery

Batch Process + Sink leaf nodes:

  • [ ] For DatabaseServices
  • [ ] For DashboardServices

Fire and Forget leaf node requests:

  • [ ] Implement a "Fire and Forget" way to send the requests to OpenMetadata for leaf nodes. We should collect the results at a later date to create the summary but not block the processing beforehand 🤔

IceS2 avatar Feb 16 '24 08:02 IceS2