model-analysis Multiclass confusion matrix / binarized metrics need class names, not just class IDs

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow Model Analysis): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu
TensorFlow Model Analysis installed from (source or binary): binary (PyPI)
TensorFlow Model Analysis version (use command below): 0.22.2
Python version: 3.6.9
Jupyter Notebook version: 6.0.3
Exact command to reproduce:

from tensorflow_model_analysis import EvalConfig
from tensorflow_model_analysis.metrics import default_multi_class_classification_specs
from google.protobuf.json_format import ParseDict

class = ['class_1', 'class_2', ...]

eval_config = {
    'model_specs': [
        {
            'name': 'rig_state',
            'model_type': 'tf_keras',
            'signature_name': 'serve_raw',
            'label_key': ...,
            'example_weight_key': 'sample_weight'
        }
    ],
    'metrics_specs': [
        {
            'metrics': [
                {
                    'class_name': 'MultiClassConfusionMatrixPlot',
                    'config': '"thresholds": [0.5]'
                },
                {'class_name': 'ExampleCount'},
                {'class_name': 'WeightedExampleCount'},
                {'class_name': 'SparseCategoricalAccuracy'},
            ],
        },
        {
            'binarize': {'class_ids': {'values': list(range(len(classes)))}},
            'metrics': [
                {'class_name': 'AUC'},
                {'class_name': 'CalibrationPlot'},
                {'class_name': 'BinaryAccuracy'},
                {'class_name': 'MeanPrediction'}
            ]
        }
    ],
    'slicing_specs': [...]
}
eval_config: EvalConfig = ParseDict(eval_config, EvalConfig())

Describe the problem

Multiclass confusion matrices and binarized metrics should support class names, not just class IDs. Something like 'binarize': {'classes': [{'id': _id, 'name': name} for _id, name in enumerate(classes)]. As it stands, having integer value IDs for the classes is meaningless to data scientists and business stakeholders looking at the TFMA visualizations.

Jun 30 '20 14:06 schmidt-jake

We are looking into this, but don't yet have a clear solution. We would like to get the class id -> name mappings via the label vocab, but we don't always have access to the vocab so we are currently looking into getting the APIs we need.

Jun 30 '20 19:06 mdreves

Typically the vocab is computed/known in an upstream step... would it be the worst idea to update the EvalConfig proto to have a field for vocab?

Jul 07 '20 17:07 schmidt-jake

@mdreves any thoughts about this suggestion?

Jul 15 '20 19:07 schmidt-jake

The idea has been floated internally a few times and we are still considering it, but the preference is to find something that is bundled with the model so that the config is shared across components.

Jul 15 '20 19:07 mdreves