RobertSamoilescu issues

Results 33 issues of


                                            RobertSamoilescu

Partial dependence plots

Implementation of the partial dependence (PD) and individual conditional expectation (ICE) leveraging`sklearn` implementation. Some functionalities that it includes * PD and ICE for numerical features * PD and ICE for...

Bump transformers to 4.16.0

Bump transformers to 4.16.0 in the future to be able to use `numpy` arrays in [Integrated Gradients transformers example](https://github.com/SeldonIO/alibi/blob/master/doc/source/examples/integrated_gradients_transformers.ipynb).

Priority: Low

Effort: XS

Extend one hot encoding functionalities

[sklearn.preprocessing.OneHotEncoder](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing) exposes multiple parameters such as: `drop`, `handle_unknown`, etc. which are useful to avoid overparametrisation (e.g. overparametrisation of binary variables, dummy trap for linear regression etc.) or handle unknown values....

Type: Design

Effort: L

Interventional TreeShap background dataset size limited to 100

Interventional TreeSHAP works properly with only up to 100 instances in the background dataset. This issue has been previously reported [here](https://github.com/slundberg/shap/issues/2487) and [here](https://github.com/slundberg/shap/issues/1991), mentioning what is the potential cause of...

Priority: Low

TreeShap

Fix type hints in TabularSampler

Consider adding explicit type hints in TabularSampler for `feature_names: list, feature_values: dict` and others. https://github.com/SeldonIO/alibi/blob/73eabdacbdb08a75494955a9affbb800ac73abbb/alibi/explainers/anchor_tabular.py#L17-L24

Good first issue

Priority: Medium

AnchorText Language Model - `convert_tokens_to_string` not conform to its signature

The `AnchorText` with `LangugeModel` does not work with [tokenizers v0.12.0](https://github.com/huggingface/tokenizers/releases/tag/v0.12.0). The issue comes from the `tokenizer.convert_tokens_to_string` which no longer returns a `str` as it signature suggests, but a `List[str]`. To...

AnchorText - fix metadata

Currently, all parameters, correct or incorrect (misspelled), are included in the metadata. https://github.com/SeldonIO/alibi/blob/390a255403d61e8d7f87123f745b678b0a5e6753/alibi/explainers/anchor_text.py#L1229 The valid parameters are stored in `self.perturb_opts`, which is set along with `all_opts` in: https://github.com/SeldonIO/alibi/blob/390a255403d61e8d7f87123f745b678b0a5e6753/alibi/explainers/anchor_text.py#L1220-L1222 This should...

Good first issue

AnchorText

AnchorText - extension for other language models.

AnchorText offers support for three masked language model: `DistilbertBaseUncased`, `BertBaseUncased`, `RobertaBase`. All previously enumerate classes inherit the `LanguageModel` class and overwrite two methods. For example, `DistilbertBaseUncased`: ```python class DistilbertBaseUncased(LanguageModel): SUBWORD_PREFIX...

AnchorText

Priority: Medium

AnchorText - long tails

Some language models support a limited number of tokens to be processed at once. Thus, the language mode extension of AnchorText splits the text in two `text = head +...

AnchorText

Priority: Low

predictor visibility in AnchorTabular

- Consider changing the visibility of predictor from public to private, and update the documentation examples. Instead of using the `explainer.predictor`, use directly the original `predictor`. - The predictor is...

internal-mle

Priority: Medium