model-analysis icon indicating copy to clipboard operation
model-analysis copied to clipboard

Multiclass confusion matrix / binarized metrics need class names, not just class IDs

Open schmidt-jake opened this issue 5 years ago • 4 comments

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow Model Analysis): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu
  • TensorFlow Model Analysis installed from (source or binary): binary (PyPI)
  • TensorFlow Model Analysis version (use command below): 0.22.2
  • Python version: 3.6.9
  • Jupyter Notebook version: 6.0.3
  • Exact command to reproduce:
from tensorflow_model_analysis import EvalConfig
from tensorflow_model_analysis.metrics import default_multi_class_classification_specs
from google.protobuf.json_format import ParseDict

class = ['class_1', 'class_2', ...]

eval_config = {
    'model_specs': [
        {
            'name': 'rig_state',
            'model_type': 'tf_keras',
            'signature_name': 'serve_raw',
            'label_key': ...,
            'example_weight_key': 'sample_weight'
        }
    ],
    'metrics_specs': [
        {
            'metrics': [
                {
                    'class_name': 'MultiClassConfusionMatrixPlot',
                    'config': '"thresholds": [0.5]'
                },
                {'class_name': 'ExampleCount'},
                {'class_name': 'WeightedExampleCount'},
                {'class_name': 'SparseCategoricalAccuracy'},
            ],
        },
        {
            'binarize': {'class_ids': {'values': list(range(len(classes)))}},
            'metrics': [
                {'class_name': 'AUC'},
                {'class_name': 'CalibrationPlot'},
                {'class_name': 'BinaryAccuracy'},
                {'class_name': 'MeanPrediction'}
            ]
        }
    ],
    'slicing_specs': [...]
}
eval_config: EvalConfig = ParseDict(eval_config, EvalConfig())

Describe the problem

Multiclass confusion matrices and binarized metrics should support class names, not just class IDs. Something like 'binarize': {'classes': [{'id': _id, 'name': name} for _id, name in enumerate(classes)]. As it stands, having integer value IDs for the classes is meaningless to data scientists and business stakeholders looking at the TFMA visualizations.

schmidt-jake avatar Jun 30 '20 14:06 schmidt-jake

We are looking into this, but don't yet have a clear solution. We would like to get the class id -> name mappings via the label vocab, but we don't always have access to the vocab so we are currently looking into getting the APIs we need.

mdreves avatar Jun 30 '20 19:06 mdreves

Typically the vocab is computed/known in an upstream step... would it be the worst idea to update the EvalConfig proto to have a field for vocab?

schmidt-jake avatar Jul 07 '20 17:07 schmidt-jake

@mdreves any thoughts about this suggestion?

schmidt-jake avatar Jul 15 '20 19:07 schmidt-jake

The idea has been floated internally a few times and we are still considering it, but the preference is to find something that is bundled with the model so that the config is shared across components.

mdreves avatar Jul 15 '20 19:07 mdreves