Add user friendly warning/error messages and helpers for log_plot()
When people start using Live.log_plot(), they could struggle with getting an expected visualization because of 2 reasons
-
log_plot()is very opinionated about the data format required for every template - there are not user-friendly data checks and warning messages
Here are some ideas to help with DVCLive onboarding:
1. "Relax" requirements for data formats supported
For example, the bar_horizontal template expects smth like this:
datapoints = [
{"name": "petal_width", "importance": 0.4},
{"name": "petal_length", "importance": 0.33},
{"name": "sepal_width", "importance": 0.24},
{"name": "sepal_length", "importance": 0.03}
]
It would be cool to support other formats like:
-
Pandas DataFrame
-
Dict with automatically extracts keys as
y' and values asx.`
{'petal_width': 0.4,
'petal_length': 0.33,
'sepal_width': 0.24,
'sepal_length': 0.03}
2. Provide minimal sanity checks for data/configs provides For example, if I run this code snippet:
from dvclive import Live
datapoints = [
{"name": "petal_width", "importance": 0.4},
{"name": "petal_length", "importance": 0.33},
{"name": "sepal_width", "importance": 0.24},
{"name": "sepal_length", "importance": 0.03}
]
with Live() as live:
live.log_plot(
"iris_feature_importance",
datapoints,
x="name",
y="importance",
template="bar_horizontal",
title="Iris Dataset: Feature Importance",
y_label="Feature Name",
x_label="Feature Importance"
)
I'll not get any error, but there is nothing showing in VSCode after that:
Reason? There is a mistake in x and y arguments assignment, the correct is y="name", x="importance". But, it's very easy to oversee this typo and spend a lot of time trying to figure it out.
How can we help?
- check that the
bar_horizontaltemplate expects numerical data forx
3. Provide good warning messages and hints if formats incompatible If we have data/args checks, we may tell about this in warning messages and this will help a lot to see smth like:
Data provided for
xhasstrtype bitnumericaldata type is expected
Another thought on a lightweight way to help here: better docs in https://dvc.org/doc/dvclive/live/log_plot. Having an example of the input format for each template could go a long way. There are already examples of different templates in https://dvc.org/doc/command-reference/plots/show that we could use as a starting point.
Background on the current implementation: https://github.com/iterative/dvclive/pull/543#pullrequestreview-1402602708
Marking as p2 since I don't think log_plot() is frequently used, but still would be really nice to have these improvements