[ENH] Auto-convert categorical columns to string in attributes_arff_from_df
Metadata
- Reference Issue: Fixes #1489
- New Tests Added: No
- Documentation Updated: No
- Change Log Entry: Automatically convert non-string categorical columns to string in
attributes_arff_from_df
Details
What does this PR implement/fix?
This PR modifies attributes_arff_from_df to improve robustness when handling pandas DataFrames. Instead of immediately raising a ValueError when encountering a categorical column with non-string values (e.g., integer-encoded categories), it now attempts to automatically convert the categories to strings.
Why is this change necessary?
Currently, the library crashes if a user provides a DataFrame with valid data but integer-based categories (e.g., [0, 1]). This forces users to manually cast categories to strings before calling the function. This change improves the User Experience by handling this conversion gracefully under the hood.
How can I reproduce the issue?
Create a DataFrame with integer categories and pass it to attributes_arff_from_df.
df = pd.DataFrame({"target": [0, 1]})
df["target"] = df["target"].astype("category")
# Before this PR: Raises ValueError
# After this PR: Automatically converts to string categories and succeeds
ds_funcs.attributes_arff_from_df(df)