nv-ingest
nv-ingest copied to clipboard
[FEA]: Add Task handler for storing metadata outputs to a persistent object database
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Currently preventing usage
Please provide a clear description of problem this feature solves
We have large JSON formatted metadata objects that are produced by our pipeline. To improve our debugging and tracing abilities, we need to add a configuration option to submitted jobs that allows all resulting metadata objects to be pushed to a storage database. This task should occur out of band, after the pipeline sink.
Describe the feature, and optionally a solution or implementation and any alternatives
Requirements
-
Configuration Option:
- [ ] Add a new task to enable storing metadata objects a database.
-
Database Integration:
- [ ] Integrate the pipeline with a sample object database.
-
Metadata Handling:
- [ ] Handle exporting all payload artifacts as independent items to the database
- [ ] JSON Metadata should be stored as a JSON field
-
Error Handling and Logging:
- [ ] Ensure that storage tasks occur prior to metric/telemetry offload and are traced correctly
- [ ] Implement robust error handling to manage database connection issues and data storage failures.
Additional context
Pipeline Flow
graph TD
A[Job Submission] --> B[Pipeline Processing]
B --> C[Generate Metadata]
C --> D[Pipeline Sink]
D --> E{Configuration Option}
E -->|Enabled| F[Store in Object DB]
E -->|Disabled| G[Skip Object DB storage]
F --> H[Telemetry Export]
G --> H[Telemetry Export]
H --> J[End of Pipeline]