Support custom names for reference and current
Description:
Add a new option that can be passed to any Evidently Report that would allow replacing “Current” and “Reference” display titles across the Report. For example, define it as current_title, and reference_title.
Context: All Evidently Reports and Metrics refer to “Current” and “Reference” datasets. Reports can also be generated only for one (“Current”) dataset.
The names “Reference” and “Current” are visible side-by-side in Metric titles. Example:
They are also included in the render of the individual plots or as part of the visualization or legend. Examples for DataDriftTable:
Example for ColumnSummaryMetric
Sometimes users want to override the default titles and set different names for their dataset and models. For example, when generating a model comparison report for two models, you might want to call them “Model A” and “Model B”, or refer to your datasets as “training” and “testing” datasets, etc.
We want to add the functionality that allows changing the display title.
Note: the existing implementation of the reference and current names is not entirely consistent across Metrics. A thorough testing is required to ensure that changes propagate to all Metrics and renders.
Implementation:
- Implement a new
current_titleandreference_titleoptions that can be passed to all individual Reports. This is a Report-level render option. You can look at the implementation of “raw_data” or “color_schema” options for inspiration. - Document the option by adding a new section titled “Changing the names of “Reference and Current” to this docs page.
- [OPTIONAL] add a how-to notebook showing the new functionality, similar to the other ones. The how-to notebook should include a very short example to show the usage of the parameter with a few different metrics and any of the metrics presets.
The option should work as the following:
report = Report(
metrics=[
RegressionQualityPreset(),
],
options={"render": {“current_title”:”ModelA”}}
)
report.run(reference_data=housing_ref, current_data=housing_cur)
report
I would be happy to contribute to this, if nobody else started working on it :)
I believe noone did @francesco086 !
Sry I got super busy lately. But I want to do this one. I believe I will be able to work on it Friday next week. I will come with an update
Hello, I wonder if this issue has been fixed and if there is any documentation related. Thanks!
Oh! Thanks for writing... I wanted to contribute on it but I had some private issues and then it slipped out of my mind. If it's not too late, I will invest some time on this one.
Update: I have been working on this yesterday and today, but it will take some time.
It's quite a pain tbh, hard-coded string "reference" and "current" are pretty much everywhere...
Hi, after a long pause when I could not find time anymore for this, I am back to this. I would need some help.
I would like to explain my approach. Since I don't know the code base and I believe in TDD, I am trying to write some simple tests that verify that the label "current" and "reference" are not output in the resulting html. This, in principle, should allow me to reverse-engineer where the label is coming from, and modify it there. Unfortunately I underestimated in how many places these strings are hard-coded. This PR (on my fork because I don't want to include the recent changes in the main) shows my current status: https://github.com/francesco086/evidently/pull/1 Notice that, to track where the string comes from, I started to add some identifiers, e.g. "current" -> "currentA". So don't be surprised about that. It's WIP.
Now we come to my help request.
Take this test: https://github.com/francesco086/evidently/pull/1/files#diff-309694ab0894d514be87278c134219f3f3432d934988647692f0e9c7be1af37fR85
The resulting html is here result.txt (it's in txt because github does not support html file upload).
Could someone tell me where the "current" and "reference" labels are set for these two plots. I cannot find it...
@DimaAmega thanks for your comments, I answered.
My impression is that perhaps are implicitly set starting from the column names in the dataframe. I would then rename the df column accordingly before the plot. But I would need to know where this happens...