patsy icon indicating copy to clipboard operation
patsy copied to clipboard

Patsy loses DatetimeIndex freq information even if no NA values

Open ChadFulton opened this issue 7 years ago • 0 comments

(This is probably better described as a pandas bug, see https://github.com/pandas-dev/pandas/issues/21282, but maybe patsy wants to patch this too?)

Reproducible example:

(proceeding from the above code)

import pandas as pd
import patsy

index = pd.DatetimeIndex(start='1990', end='1994', freq='AS')
data = pd.Series([0, 1, 2, 3, 4], name='y', index=index)
print(data.index)

lhs, rhs = patsy.dmatrices('y ~ 1', data={'y':data}, return_type='dataframe')
print(lhs.index)

The first print statement yields:

DatetimeIndex(['1990-01-01', '1991-01-01', '1992-01-01', '1993-01-01',
               '1994-01-01'],
              dtype='datetime64[ns]', freq='AS-JAN')

Whereas the second yields:

DatetimeIndex(['1990-01-01', '1991-01-01', '1992-01-01', '1993-01-01',
               '1994-01-01'],
              dtype='datetime64[ns]', freq=None)

This is a consequence of https://github.com/pandas-dev/pandas/issues/21282 as it affects the following function in patsy/missing.py:

def _handle_NA_drop(self, values, is_NAs, origins):
        total_mask = np.zeros(is_NAs[0].shape[0], dtype=bool)
        for is_NA in is_NAs:
            total_mask |= is_NA
        good_mask = ~total_mask
        # "..." to handle 1- versus 2-dim indexing
        return [v[good_mask, ...] for v in values]

when v is the DatetimeIndex, the ellipses cause the index to lose frequency information.

ChadFulton avatar Jun 01 '18 02:06 ChadFulton