ferret icon indicating copy to clipboard operation
ferret copied to clipboard

A python package for benchmarking interpretability techniques on Transformers.

ferret

|pypi badge| |docs badge| |demo badge| |youtube badge| |arxiv badge| |downloads badge|

|banner|

.. |pypi badge| image:: https://img.shields.io/pypi/v/ferret-xai.svg :target: https://pypi.python.org/pypi/ferret-xai :alt: Latest PyPI version

.. |Docs Badge| image:: https://readthedocs.org/projects/ferret/badge/?version=latest :alt: Documentation Status :scale: 100% :target: https://ferret.readthedocs.io/en/latest/?version=latest

.. |demo badge| image:: https://img.shields.io/badge/HF%20Spaces-Demo-yellow :alt: HuggingFace Spaces Demo :scale: 100% :target: https://huggingface.co/spaces/g8a9/ferret

.. |youtube badge| image:: https://img.shields.io/badge/youtube-video-red :alt: YouTube Video :scale: 100% :target: https://www.youtube.com/watch?v=kX0HcSah_M4

.. |banner| image:: /_static/banner.png :alt: Ferret circular logo with the name to the right :scale: 100%

.. |arxiv badge| image:: https://img.shields.io/badge/arXiv-2208.01575-b31b1b.svg :alt: arxiv preprint :scale: 100% :target: https://arxiv.org/abs/2208.01575

.. |downloads badge| image:: https://pepy.tech/badge/ferret-xai :alt: downloads badge :scale: 100% :target: https://pepy.tech/project/ferret-xai

A python package for benchmarking interpretability techniques.

  • Free software: MIT license
  • Documentation: https://ferret.readthedocs.io.

.. code-block:: python

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from ferret import Benchmark

name = "cardiffnlp/twitter-xlm-roberta-base-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(name)
tokenizer = AutoTokenizer.from_pretrained(name)

bench = Benchmark(model, tokenizer)
explanations = bench.explain("You look stunning!", target=1)
evaluations = bench.evaluate_explanations(explanations, target=1)

bench.show_evaluation_table(evaluations)

Features

ferret offers a painless integration with Hugging Face models and naming conventions. If you are already using the transformers <https://github.com/huggingface/transformers>_ library, you immediately get access to our Explanation and Evaluation API.

Supported Post-hoc Explainers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  • Gradient (plain gradients or multiplied by input token embeddings) (Simonyan et al., 2014 <https://arxiv.org/abs/1312.6034>_)
  • Integrated Gradient (plain gradients or multiplied by input token embeddings) (Sundararajan et al., 2017 <http://proceedings.mlr.press/v70/sundararajan17a.html>_)
  • SHAP (via Partition SHAP approximation of Shapley values) (Lundberg and Lee, 2017 <https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html>_)
  • LIME (Ribeiro et al., 2016 <https://dl.acm.org/doi/abs/10.1145/2939672.2939778>_)

Supported Evaluation Metrics ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Faithfulness measures:

  • AOPC Comprehensiveness (DeYoung et al., 2020 <https://doi.org/10.18653/v1/2020.acl-main.408>_)
  • AOPC Sufficiency (DeYoung et al., 2020 <https://doi.org/10.18653/v1/2020.acl-main.408>_)
  • Kendall’s Tau correlation with Leave-One-Out token removal. (Jain and Wallace, 2019 <https://aclanthology.org/N19-1357/>_)

Plausibility measures:

  • Area-Under-Precision-Recall-Curve (soft score) (DeYoung et al., 2020 <https://doi.org/10.18653/v1/2020.acl-main.408>_)
  • Token F1 (hard score) (DeYoung et al., 2020 <https://doi.org/10.18653/v1/2020.acl-main.408>_)
  • Token Intersection Over Union (hard score) (DeYoung et al., 2020 <https://doi.org/10.18653/v1/2020.acl-main.408>_)

See our paper <https://arxiv.org/abs/2208.01575>_ for details.

Visualization

The Benchmark class exposes easy-to-use table visualization methods (e.g., within Jupyter Notebooks)

.. code-block:: python

bench = Benchmark(model, tokenizer)

# Pretty-print feature attribution scores by all supported explainers
explanations = bench.explain("You look stunning!")
bench.show_table(explanations)

# Pretty-print all the supported evaluation metrics
evaluations = bench.evaluate_explanations(explanations)
bench.show_evaluation_table(evaluations)

Dataset Evaluations

The Benchmark class has a handy method to compute and average our evaluation metrics across multiple samples from a dataset.

.. code-block:: python

import numpy as np
bench = Benchmark(model, tokenizer)

# Compute and average evaluation scores one of the supported dataset
samples = np.arange(20)
hatexdata = bench.load_dataset("hatexplain")
sample_evaluations =  bench.evaluate_samples(hatexdata, samples)

# Pretty-print the results
bench.show_samples_evaluation_table(sample_evaluations)

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

  • Cookiecutter: https://github.com/audreyr/cookiecutter
  • audreyr/cookiecutter-pypackage: https://github.com/audreyr/cookiecutter-pypackage

Logo and graphical assets made by Luca Attanasio <https://www.behance.net/attanasiol624d>_.