deeplake
deeplake copied to clipboard
Cleanlab + Skorch Integration
🚀 🚀 Pull Request
Checklist:
- [ ] My code follows the style guidelines of this project and the Contributing document
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have kept the
coverage-rateup - [ ] I have performed a self-review of my own code and resolved any problems
- [ ] I have checked to ensure there aren't any other open Pull Requests for the same change
- [x] I have described and made corresponding changes to the relevant documentation
- [ ] New and existing unit tests pass locally with my changes
Changes
This PR is an integration of cleanlab open-source library to Hub. This is a quick snippet of the API:
from hub.integrations.cleanlab import clean_labels, create_tensors, clean_view
from hub.integrations import skorch
ds = hub.load("hub://ds")
tform = transforms.Compose(
[
transforms.ToPILImage(),
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,)),
]
)
transform = {"images": tform, "labels": None}
# Get scikit-learn compatible PyTorch module to pass into clean_labels
model = skorch(dataset=ds, epochs=5, batch_size=16, transform=transform)
# Obtain a DataFrame with columns is_label_issue, label_quality and predicted_label
label_issues = clean_labels(
dataset=ds,
model=model,
folds=3,
)
# Create label_issues tensor
create_tensors(
dataset=ds,
label_issues=label_issues,
branch="main"
)
# Get dataset view where only clean labels are present, and the rest are filtered out.
ds_clean = clean_view(ds)
To-do
- [x] Create custom config for
pip install(e.g.pip install hub[’cleanlab’]) - [x] Add support for validation set
- [ ] Add prune support to delete samples with where
is_label_issue = True - [x] Try to use a pre-trained model to compute out-of-sample probabilities to skip cross-validation and speed up the training.
- [x] Add tests for the functions
- [x] Add types for the class arguments
- [x] Create a tensor
guessed_labelto add labels guessed by the classifier after pruning. - [x] Add optional
cleanlabkwargs to pass down - [x] Add optional
skorchkwargs to pass down - [ ] Add support for TensorFlow modules
- [x] Add flag
branchto move to a different branch instead of making a commit on a current branch.
Hi, @lowlypalace I want to know why needing to integrate skorch ?
Hey @lowlypalace closing for now. Will reopen once there is time to work on this. Thanks