kernel-memory Pipeline failures

Motivation and Context (Why the change? What's the scenario?)

See https://github.com/microsoft/kernel-memory/issues/432.

Apr 29 '24 09:04 marcominerva

The Logs property was just an idea, but we can remove it for now, the important thing is how to understand if the pipeline actually failed. I have added the catch blocks in the Distributed and InProcess Orchestrators to handle all unexpected errors like an attempt to decode an invalid PDF file (that will cause an Exception in the stepHandler.InvokeAsync method).

@dluc Your idea about assigning an approximate completion time and use it to determine the status of the pipeline is quite interesting and will allow to add the failure management without touching the Orchestrators at all, but how to determine a suitable value for the completion time? Maybe it could be a setting in configuration, that is saved in the DataPipeline object when building it?

Apr 30 '24 08:04 marcominerva

Parking this for now, since nobody else has raised similar concerns. In this area I think users will want a stateful orchestrator that exposes each job with some nice UI, allowing to inspect, to pause/cancel, to re-run etc. and KM is not meant to develop in that direction. I would leave KM to rely on logs, and potentially develop the orchestration features separately, leveraging existing solutions.

Oct 16 '24 21:10 dluc