Add diagnostics to components
We might need to use dask.distributed for this, see #395
The LocalCluster already provides an interactive dashboard, which is accessible when the pipeline is running on your local machine. However, accessing the dashboard becomes less straightforward when the pipeline is not deployed locally. In such cases, it necessitates port forwarding to the Kubernetes pod or the remote VM in order to access the dashboard.
Dask dashboard
Maybe we could leverage the data explorer to accessing the dashboard more easy.
We have the option to store dashboard exports as static html files in cloud buckets by following the following the documentation:
from dask.distributed import performance_report
with performance_report(filename="dask-report.html"):
## some dask computation
Additionally we could enable the access to the exported html files through the data explorer. However, I am not sure if a streamlit application is still the best option for rendering complete static html files.
For accessing the dashboard of the current run, we should make an effort to document the process of port-forwarding. The implementation of a general solution may be challenging due to the substantial variability in execution environments, and it is probably not possible for every execution environment (e.g. Vertex).
Further diagnostics
The dask dashboard provides valuable insights into the execution of dask. However, it might be a bit overwhelming and unintuitive if you see it for the first time. The most important metrics are not clear on the first point of view, e.g.:
- The number of partitions being processed.
- The overall progress of the pipeline.
- Memory profiling at the partition level, specifically, the RAM requirements for processing individual partitions.
- ...
We could add a diagnostics class with the capability to log various metrics. These metrics can also be exported to a cloud bucket. Furthermore, we could implement simple visualizations for these metrics inside the fondant explorer. For instance, a basic progress bar could be used to visually represent the progress of individual pipeline steps.
Thanks for the breakdown @mrchtr
I agree with your suggestions:
- Let's export the dashboard. Since it's exported as html, it shouldn't be too hard to show it. As a first step we could even just log the (authenticated) url where it's saved which the user can then open in a browser directly.
- Let's document how to access the dashboard. By documenting this it will become clear for us as well if there are steps we can automate.
- FYI, it should be possible to expose the dashboard on Vertex as well (link.-,enableDashboardAccess,-boolean))
- Let's focus on improving the logging of valuable metrics so they can at least be used for debugging.
I think these actions will already provide a lot of benefit, and after adding them, we'll have a better view on which further steps would be most useful.