Elías Snorrason

Results 34 issues of Elías Snorrason

## How to reproduce the behaviour It looks like importing language data for Icelandic is broken. E.g. to get stop words: ```python # This works import spacy.lang.en.stop_words spacy.lang.en.stop_words.STOP_WORDS # Syntax...

bug
lang / is
🔜 v4.0

This PR adds `HighDimPDE_reflect_outs`, another implementation of `HighDimPDE._reflect`/`HighDimPDE._reflect_GPU`, which computes `out` (`out1` and `out2`) along with `rtemp`/`rmin` exclusively from indices of `b` where it "lies outside the boundary of `[s,e]^d`"....

The function only returns a `Tuple[np.ndarray, list]`, but it is annotated with: https://github.com/cleanlab/cleanlab/blob/fad4eb266dee8b9e2925d3f0d74fe4a81939eb8a/cleanlab/token_classification/rank.py#L36

good first issue

> Related to this, we might want to show `LabelLike` instead of expanding the type alias. See [this StackOverflow thread](https://stackoverflow.com/questions/60028577/keeping-alias-types-simple-in-python-documentation) for some approaches. _Originally posted by @anishathalye in https://github.com/cleanlab/cleanlab/issues/398#issuecomment-1236109488_

Check for nan values in the label column. This cannot be handled by the NullIssueManager, because it occurs in `Datalab(data=df_with_nan_value_in_label_column, label_name="label_column")`. For now, we need better error reporting.

enhancement
good first issue
help-wanted

# Stack trace From Github Actions logs: ``` =================================== FAILURES =================================== _______________________ test_tensorflow_functional[32-2] _______________________ batch_size = 32, shuffle_config = 2 data = {'X': array([[-1.00724718, -0.92444024, -1.1659146 ], [-1.0525093 , -1.15150253,...

bug
help-wanted

After updating the near-duplicate scores, a test was added to ensure that near-duplicate examples have worse scores than non-near-duplicates. RIght now, the test only works on a small, toy dataset....

good first issue
help-wanted

Property-based tests for near-duplicate sets are randomly failing in CI, when some health-checks don't pass for generated data. # Stack trace Every so often, CI randomly fails a test with...

good first issue
help-wanted

When running Datalab for regression, the detected issues vary greatly across Python/OS versions in CI, making assertions about the issue masks difficult and slows development down. We need to figure...

good first issue
help-wanted

While cleanlab has added support for various types of ML tasks, The only task-specific check supported by `Datalab` is label error detection for classification. ## Proposed Changes The most straightforward...

enhancement
help-wanted