FIDDLE icon indicating copy to clipboard operation
FIDDLE copied to clipboard

steps.py fails on mixed-type columns

Open alicewanner opened this issue 4 months ago • 0 comments

The bug occurs when processing columns in the input DataFrame that contain a mix of numeric and non-numeric values. The relevant code is:

for col in df.columns:
     col_data = df[col]
     col_is_numeric = [is_numeric(v) for v in col_data if not pd.isnull(v)]
     if not all(col_is_numeric) and any(col_is_numeric): 
          numeric_mask = col_data.apply(is_numeric)
          df[col+'_str'] = df[col].copy()
          df.loc[~numeric_mask, col] = np.nan
          df.loc[numeric_mask, col+'_str'] = np.nan

col loops through all unique variable_names from the inputs file

col_data contains all values for that variable for each ID and time. If no record exists for a given ID/time, the value is None.

The code works correctly if all non-None values are of the same type (e.g., all floats).

Problem: When col_data contains mixed types, numeric_mask includes non-boolean values (None for missing, False for strings, True for numbers). df.loc[numeric_mask] and df.loc[~numeric_mask] fail because loc expects a fully boolean mask.

Proposed Fix: Replace the failing lines with:

df.loc[numeric_mask == False, col] = np.nan
df.loc[numeric_mask == True, col+'_str'] = np.nan

This ensures loc always receives a fully boolean mask.

alicewanner avatar Oct 06 '25 14:10 alicewanner