openml-python icon indicating copy to clipboard operation
openml-python copied to clipboard

[ENH] Automatically convert non-string categorical data in attributes_arff_from_df

Open alphaleporus opened this issue 2 months ago • 1 comments

Is your feature request related to a problem? Please describe. Currently, attributes_arff_from_df raises a ValueError if a pandas DataFrame contains a categorical column with non-string values (e.g., integers [0, 1]). The user is forced to manually cast these to strings before passing the DataFrame.

Describe the solution you'd like Instead of raising an error immediately, the function should attempt to automatically convert the categories to strings using .astype(str). This improves UX for users working with mixed-type or integer-encoded categorical data.

Describe alternatives you've considered Keep raising the error, but improve the message. However, automatic conversion is more user-friendly as ARFF expects string nominals anyway.

Additional context I have a fix implemented locally and can submit a PR.

alphaleporus avatar Nov 20 '25 19:11 alphaleporus

PR submitted!

alphaleporus avatar Nov 20 '25 19:11 alphaleporus