openml-python icon indicating copy to clipboard operation
openml-python copied to clipboard

[ENH] Auto-convert categorical columns to string in attributes_arff_from_df

Open alphaleporus opened this issue 2 months ago • 0 comments

Metadata

  • Reference Issue: Fixes #1489
  • New Tests Added: No
  • Documentation Updated: No
  • Change Log Entry: Automatically convert non-string categorical columns to string in attributes_arff_from_df

Details

What does this PR implement/fix? This PR modifies attributes_arff_from_df to improve robustness when handling pandas DataFrames. Instead of immediately raising a ValueError when encountering a categorical column with non-string values (e.g., integer-encoded categories), it now attempts to automatically convert the categories to strings.

Why is this change necessary? Currently, the library crashes if a user provides a DataFrame with valid data but integer-based categories (e.g., [0, 1]). This forces users to manually cast categories to strings before calling the function. This change improves the User Experience by handling this conversion gracefully under the hood.

How can I reproduce the issue? Create a DataFrame with integer categories and pass it to attributes_arff_from_df.

df = pd.DataFrame({"target": [0, 1]})
df["target"] = df["target"].astype("category")
# Before this PR: Raises ValueError
# After this PR: Automatically converts to string categories and succeeds
ds_funcs.attributes_arff_from_df(df)

alphaleporus avatar Nov 20 '25 19:11 alphaleporus