ENH: quantile/bin numbering schema should be more flexible
Problem Description
If you try to pass a dataframe with a non-continuous list of ints (that starts with 1) in the factor_quantile column to create_turnover_tearsheet(), you will get the following error:
TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'Timedelta'
Also, the list of ints must start with 1 and increase by + 1. For example, if factor_data[factor_quantile].unique() is [1,2,3,4,5], you're good to go. but if factor_data[factor_quantile].unique() is [1,3,4,5], or [2,3,4,5] it will return the error.
I believe this is caused by Alphalens looking for a continuous list (starting with 1) because of the range statement on this line: https://github.com/quantopian/alphalens/blob/master/alphalens/tears.py#L415
Please provide a minimal, self-contained, and reproducible example:
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline.data import factset, USEquityPricing
from quantopian.pipeline.factors import AverageDollarVolume
from alphalens.tears import create_turnover_tear_sheet
from alphalens.utils import get_clean_factor_and_forward_returns
def make_pipeline():
market_cap_filter = factset.Fundamentals.mkt_val.latest > 500000000
volume_filter = AverageDollarVolume(window_length=200) > 2500000
price_filter = USEquityPricing.close.latest > 5
base_universe = market_cap_filter & volume_filter & price_filter
factor_to_analyze = factset.Fundamentals.assets.latest
return Pipeline(
columns = {'factor_to_analyze': factor_to_analyze},
screen = base_universe & factor_to_analyze.notnull()
)
pipeline_output = run_pipeline(make_pipeline(), '2015-1-1', '2016-1-1')
pricing_data = get_pricing(pipeline_output.index.levels[1], '2015-1-1', '2016-6-1', fields='open_price')
factor_data = get_clean_factor_and_forward_returns(
factor = pipeline_output['factor_to_analyze'],
prices = pricing_data,
)
create_turnover_tear_sheet(factor_data[factor_data['factor_quantile'].isin([1, 3, 4, 5])])
Please provide the full traceback:
TypeErrorTraceback (most recent call last)
<ipython-input-5-1e0de69d03d0> in <module>()
----> 1 create_turnover_tear_sheet(factor_data[factor_data['factor_quantile'].isin([1, 3, 4, 5])])
/usr/local/lib/python2.7/dist-packages/alphalens/plotting.pyc in call_w_context(*args, **kwargs)
43 with plotting_context(), axes_style(), color_palette:
44 sns.despine(left=True)
---> 45 return func(*args, **kwargs)
46 else:
47 return func(*args, **kwargs)
/usr/local/lib/python2.7/dist-packages/alphalens/tears.pyc in create_turnover_tear_sheet(factor_data, turnover_periods)
415 for q in range(1, int(quantile_factor.max()) + 1)],
416 axis=1)
--> 417 for p in turnover_periods}
418
419 autocorrelation = pd.concat(
/usr/local/lib/python2.7/dist-packages/alphalens/tears.pyc in <dictcomp>((p,))
415 for q in range(1, int(quantile_factor.max()) + 1)],
416 axis=1)
--> 417 for p in turnover_periods}
418
419 autocorrelation = pd.concat(
/usr/local/lib/python2.7/dist-packages/alphalens/performance.pyc in quantile_turnover(quantile_factor, quantile, period)
738 shifted_idx = utils.add_custom_calendar_timedelta(
739 quant_name_sets.index, -pd.Timedelta(period),
--> 740 quantile_factor.index.levels[0].freq)
741 name_shifted = quant_name_sets.reindex(shifted_idx)
742 name_shifted.index = quant_name_sets.index
/usr/local/lib/python2.7/dist-packages/alphalens/utils.pyc in add_custom_calendar_timedelta(input, timedelta, freq)
918 days = timedelta.components.days
919 offset = timedelta - pd.Timedelta(days=days)
--> 920 return input + freq * days + offset
921
922
/usr/local/lib/python2.7/dist-packages/pandas/indexes/base.pyc in __add__(self, other)
1648 if isinstance(other, Index):
1649 return self.union(other)
-> 1650 return Index(np.array(self) + other)
1651
1652 def __radd__(self, other):
TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'Timedelta'
Python / Alphalens versions are whatever the Quantopian research platform is running on as of March 21st 2019.
CC @luca-s
Thanks for reporting this @quantopiancal . You are right in saying that Alphalens assumes the list of quantiles/bins must start with 1 and increase by + 1 with no gaps. This is an assumption around the which the code has been built. There are probably only few places where this assumption is used, but I cannot list them by heart.
I believe it is not a big issue since it is possible to pre-process the input data to make it suitable for Alphalens, but it is never nice to have assumptions in the code. If you like to provide a PR to relax this constraint it would certainly be very welcome.
Hi @luca-s - Changing line 415 in alphalens.tears.create_turnover_tear_sheet from:
quantile_turnover = \
{p: pd.concat([perf.quantile_turnover(quantile_factor, q, p)
for q in range(1, int(quantile_factor.max()) + 1)],
axis=1)
for p in turnover_periods}
to
quantile_turnover = \
{p: pd.concat([perf.quantile_turnover(quantile_factor, q, p)
for q in quantile_factor.sort_values().unique().tolist()],
axis=1)
for p in turnover_periods}
relaxes the constraint being discussed here. It's worth noting that the TypeError @quantopiancal mentioned in the original post is tied to an empty series being presented for empty quantiles when working with pandas 0.18.1 (and potentially older versions). I've tested with pandas >= 0.20.3 and the tearsheets run without the modification mentioned above.
I'll submit a PR with this change as a next step.