Elías Snorrason issues

Results 34 issues of


                                            Elías Snorrason

Broken import for Icelandic language data

## How to reproduce the behaviour It looks like importing language data for Icelandic is broken. E.g. to get stop words: ```python # This works import spacy.lang.en.stop_words spacy.lang.en.stop_words.STOP_WORDS # Syntax...

bug

lang / is

🔜 v4.0

Reflect via indices outside boundaries

This PR adds `HighDimPDE_reflect_outs`, another implementation of `HighDimPDE._reflect`/`HighDimPDE._reflect_GPU`, which computes `out` (`out1` and `out2`) along with `rtemp`/`rmin` exclusively from indices of `b` where it "lies outside the boundary of `[s,e]^d`"....

fix return type of get_label_quality_scores in token classification

The function only returns a `Tuple[np.ndarray, list]`, but it is annotated with: https://github.com/cleanlab/cleanlab/blob/fad4eb266dee8b9e2925d3f0d74fe4a81939eb8a/cleanlab/token_classification/rank.py#L36

good first issue

Keep type aliases not evaluated in docs

> Related to this, we might want to show `LabelLike` instead of expanding the type alias. See [this StackOverflow thread](https://stackoverflow.com/questions/60028577/keeping-alias-types-simple-in-python-documentation) for some approaches. _Originally posted by @anishathalye in https://github.com/cleanlab/cleanlab/issues/398#issuecomment-1236109488_

Sanitize label column when initializing a Datalab instance.

Check for nan values in the label column. This cannot be handled by the NullIssueManager, because it occurs in `Datalab(data=df_with_nan_value_in_label_column, label_name="label_column")`. For now, we need better error reporting.

enhancement

good first issue

help-wanted

test_tensorflow_functional(batch_size=32, shuffle_config=2, ...) fails on pytest 8.0.0

# Stack trace From Github Actions logs: ``` =================================== FAILURES =================================== _______________________ test_tensorflow_functional[32-2] _______________________ batch_size = 32, shuffle_config = 2 data = {'X': array([[-1.00724718, -0.92444024, -1.1659146 ], [-1.0525093 , -1.15150253,...

bug

help-wanted

Turn near-duplicate score test into a property-based test

After updating the near-duplicate scores, a test was added to ensure that near-duplicate examples have worse scores than non-near-duplicates. RIght now, the test only works on a small, toy dataset....

good first issue

help-wanted

Improve property-based test for near-duplicate sets

Property-based tests for near-duplicate sets are randomly failing in CI, when some health-checks don't pass for generated data. # Stack trace Every so often, CI randomly fails a test with...

good first issue

help-wanted

Improve the stability of results from end-to-end tests of Datalab with label error-detection for regression tasks

When running Datalab for regression, the detected issues vary greatly across Python/OS versions in CI, making assertions about the issue masks difficult and slows development down. We need to figure...

good first issue

help-wanted

Supporting task-specific issues in Datalab

While cleanlab has added support for various types of ML tasks, The only task-specific check supported by `Datalab` is label error detection for classification. ## Proposed Changes The most straightforward...

enhancement

help-wanted