fondant icon indicating copy to clipboard operation
fondant copied to clipboard

Add diagnostics to components

Open PhilippeMoussalli opened this issue 2 years ago • 3 comments

PhilippeMoussalli avatar Aug 01 '23 09:08 PhilippeMoussalli

We might need to use dask.distributed for this, see #395

RobbeSneyders avatar Aug 29 '23 09:08 RobbeSneyders

The LocalCluster already provides an interactive dashboard, which is accessible when the pipeline is running on your local machine. However, accessing the dashboard becomes less straightforward when the pipeline is not deployed locally. In such cases, it necessitates port forwarding to the Kubernetes pod or the remote VM in order to access the dashboard.

Dask dashboard

Maybe we could leverage the data explorer to accessing the dashboard more easy.

We have the option to store dashboard exports as static html files in cloud buckets by following the following the documentation:

from dask.distributed import performance_report

with performance_report(filename="dask-report.html"):
    ## some dask computation

Additionally we could enable the access to the exported html files through the data explorer. However, I am not sure if a streamlit application is still the best option for rendering complete static html files.

For accessing the dashboard of the current run, we should make an effort to document the process of port-forwarding. The implementation of a general solution may be challenging due to the substantial variability in execution environments, and it is probably not possible for every execution environment (e.g. Vertex).

Further diagnostics

The dask dashboard provides valuable insights into the execution of dask. However, it might be a bit overwhelming and unintuitive if you see it for the first time. The most important metrics are not clear on the first point of view, e.g.:

  • The number of partitions being processed.
  • The overall progress of the pipeline.
  • Memory profiling at the partition level, specifically, the RAM requirements for processing individual partitions.
  • ...

We could add a diagnostics class with the capability to log various metrics. These metrics can also be exported to a cloud bucket. Furthermore, we could implement simple visualizations for these metrics inside the fondant explorer. For instance, a basic progress bar could be used to visually represent the progress of individual pipeline steps.

mrchtr avatar Oct 24 '23 05:10 mrchtr

Thanks for the breakdown @mrchtr

I agree with your suggestions:

  • Let's export the dashboard. Since it's exported as html, it shouldn't be too hard to show it. As a first step we could even just log the (authenticated) url where it's saved which the user can then open in a browser directly.
  • Let's document how to access the dashboard. By documenting this it will become clear for us as well if there are steps we can automate.
    • FYI, it should be possible to expose the dashboard on Vertex as well (link.-,enableDashboardAccess,-boolean))
  • Let's focus on improving the logging of valuable metrics so they can at least be used for debugging.

I think these actions will already provide a lot of benefit, and after adding them, we'll have a better view on which further steps would be most useful.

RobbeSneyders avatar Oct 24 '23 14:10 RobbeSneyders