patsy icon indicating copy to clipboard operation
patsy copied to clipboard

`build._eval_factor` error condition attempts to access attribute `dtype` on `DataFrame` object, but no such attribute exists

Open carrutstick opened this issue 5 months ago • 2 comments

In _eval_factor we have:

    # Returns either a 2d ndarray, or a DataFrame, plus is_NA mask
    if factor_info.type == "numerical":
        result = atleast_2d_column_default(result, preserve_pandas=True)

followed by

        if not safe_issubdtype(np.asarray(result).dtype, np.number):
            raise PatsyError(
                "when evaluating numeric factor %s, "
                "I got non-numeric data of type '%s'" % (factor.name(), result.dtype),
                factor,
            )

When result is a pandas.DataFrame but the factor is not actually a safe numeric type, this code will try to raise an error, but in doing so it attempts to access result.dtype which does not exist.

This is not a critical bug; it just results in some pretty confusing error messages.

carrutstick avatar Aug 07 '25 01:08 carrutstick

Which version of pandas is this, and can you provide a complete example that leads to the failure? I suspect that this is due to extension dtypes in pandas.

bashtage avatar Aug 18 '25 10:08 bashtage

This should be pandas 2.2.3. I'll try to put together a minimal example later this week; I originally encountered this issue in the middle of a large automated pipeline, triggered when a bug allowed a string column into some regressions.

carrutstick avatar Aug 18 '25 14:08 carrutstick