Elías Snorrason
Elías Snorrason
## How to reproduce the behaviour It looks like importing language data for Icelandic is broken. E.g. to get stop words: ```python # This works import spacy.lang.en.stop_words spacy.lang.en.stop_words.STOP_WORDS # Syntax...
This PR adds `HighDimPDE_reflect_outs`, another implementation of `HighDimPDE._reflect`/`HighDimPDE._reflect_GPU`, which computes `out` (`out1` and `out2`) along with `rtemp`/`rmin` exclusively from indices of `b` where it "lies outside the boundary of `[s,e]^d`"....
The function only returns a `Tuple[np.ndarray, list]`, but it is annotated with: https://github.com/cleanlab/cleanlab/blob/fad4eb266dee8b9e2925d3f0d74fe4a81939eb8a/cleanlab/token_classification/rank.py#L36
> Related to this, we might want to show `LabelLike` instead of expanding the type alias. See [this StackOverflow thread](https://stackoverflow.com/questions/60028577/keeping-alias-types-simple-in-python-documentation) for some approaches. _Originally posted by @anishathalye in https://github.com/cleanlab/cleanlab/issues/398#issuecomment-1236109488_
Check for nan values in the label column. This cannot be handled by the NullIssueManager, because it occurs in `Datalab(data=df_with_nan_value_in_label_column, label_name="label_column")`. For now, we need better error reporting.
# Stack trace From Github Actions logs: ``` =================================== FAILURES =================================== _______________________ test_tensorflow_functional[32-2] _______________________ batch_size = 32, shuffle_config = 2 data = {'X': array([[-1.00724718, -0.92444024, -1.1659146 ], [-1.0525093 , -1.15150253,...
After updating the near-duplicate scores, a test was added to ensure that near-duplicate examples have worse scores than non-near-duplicates. RIght now, the test only works on a small, toy dataset....
Property-based tests for near-duplicate sets are randomly failing in CI, when some health-checks don't pass for generated data. # Stack trace Every so often, CI randomly fails a test with...
When running Datalab for regression, the detected issues vary greatly across Python/OS versions in CI, making assertions about the issue masks difficult and slows development down. We need to figure...
While cleanlab has added support for various types of ML tasks, The only task-specific check supported by `Datalab` is label error detection for classification. ## Proposed Changes The most straightforward...