cleanlab-studio icon indicating copy to clipboard operation
cleanlab-studio copied to clipboard

improved autofix strategy

Open aditya1503 opened this issue 2 years ago • 3 comments

Skeleton code for improved Auto-Fix strategies

from cleanlab_studio import Studio
API_KEY = os.environ['CLEANLAB_API_KEY']
studio = Studio(API_KEY)
df = pd.DataFrame(...)
dataset_id = studio.upload_dataset(df)
project_id = studio.create_project(dataset_id=dataset_id, ...)
cleanset_id = studio.get_latest_cleanset_id(project_id)


# Beginner user:
new_df = studio.autofix_dataset(df, cleanset_id)  # deepcopy of df 


# Advanced user pattern:
hyperparam_dict = get_autofix_defaults(cleanset_id)  # contains integer values correspond to number of data points to fix/exclude for each issue-type
# user who wants to edit less data will manually adjust the integers in hyperparam_dict  
new_df = studio.autofix_dataset(df, cleanset_id, params=hyperparam_dict)

Link to Notion: https://www.notion.so/cleanlab/Improve-ML-accuracy-with-Studio-via-better-Autofix-99434fa92a164131b3860093d85e5350?pvs=4

Note: this is only for text/tabular datasets, not image.

aditya1503 avatar Nov 16 '23 20:11 aditya1503

request my review when this is ready

jwmueller avatar Nov 16 '23 22:11 jwmueller

add a little script on how the user is going to use this thing as a PR comment

jwmueller avatar Nov 17 '23 17:11 jwmueller

from anish: Would you want this to be:

studio.autofix_dataset(cleanset_id) new_df = studio.apply_corrections(df, cleanset_id)

jwmueller avatar Nov 22 '23 18:11 jwmueller