alphalens icon indicating copy to clipboard operation
alphalens copied to clipboard

utils.quantize_factor()

Open ilovecocolade opened this issue 6 years ago • 7 comments

Change the default value for quantiles in this function to quantiles=None. When using bins instead of quantiles with the utils.get_clean_factor_and_forward_returns() function, there is an error in the utils.quantize_factor() inferring that no bin size or quantile sizes were passed but it is actually because utils.quantize_factor() passes quantiles = 5 by default, regardless of if bins in not None.

ilovecocolade avatar Sep 18 '19 13:09 ilovecocolade

Could you please provide a minimal example that replicate the error?

luca-s avatar Sep 18 '19 14:09 luca-s

Example:

# Create toy data
dates = pd.date_range("20190101", "20190105")
prices_dates = pd.date_range("20190101", "20190201")
assets = list(range(10))

factor = pd.DataFrame(index=pd.MultiIndex.from_product([dates, assets]),
                      data=np.random.randn(50))
prices = pd.DataFrame(index=prices_dates, columns=assets, data=np.abs(np.random.randn(32, 10)))

factor.index.set_names(["date", "asset"], inplace=True)
prices.index.rename("date", inplace=True)

# Error occurs here
al.utils.get_clean_factor_and_forward_returns(factor, prices, bins=[-2, 0, 2])

Error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-76-7c85abde5837> in <module>
     12 
     13 # Error occurs here
---> 14 al.utils.get_clean_factor_and_forward_returns(factor, prices, bins=[-2, 0, 2])

/usr/local/anaconda/lib/python3.7/site-packages/alphalens/utils.py in get_clean_factor_and_forward_returns(factor, prices, groupby, binning_by_group, quantiles, bins, periods, filter_zscore, groupby_labels, max_loss, zero_aware, cumulative_returns, compute_quantiles)
    805                                    binning_by_group=binning_by_group,
    806                                    max_loss=max_loss, zero_aware=zero_aware,
--> 807                                    compute_quantiles=compute_quantiles)
    808 
    809     return factor_data

/usr/local/anaconda/lib/python3.7/site-packages/alphalens/utils.py in get_clean_factor(factor, forward_returns, groupby, binning_by_group, quantiles, bins, groupby_labels, max_loss, zero_aware, compute_quantiles)
    607             binning_by_group,
    608             no_raise,
--> 609             zero_aware
    610         )
    611 

/usr/local/anaconda/lib/python3.7/site-packages/alphalens/utils.py in dec(*args, **kwargs)
     73     def dec(*args, **kwargs):
     74         try:
---> 75             return func(*args, **kwargs)
     76         except ValueError as e:
     77             if 'Bin edges must be unique' in str(e):

/usr/local/anaconda/lib/python3.7/site-packages/alphalens/utils.py in quantize_factor(factor_data, quantiles, bins, by_group, no_raise, zero_aware)
    128     if not ((quantiles is not None and bins is None) or
    129             (quantiles is None and bins is not None)):
--> 130         raise ValueError('Either quantiles or bins should be provided')
    131 
    132     if zero_aware and not (isinstance(quantiles, int)

ValueError: Either quantiles or bins should be provided

As @ilovecocolade suggests, the default for quantize_factor is probably behind this.

eigenfoo avatar Oct 02 '19 12:10 eigenfoo

Hi,

Yes @luca-s, the error provided by @eigenfoo is the error I am referring to. When quantize_factor() is called it passes a default value of quantiles=5, regardless of whether bins is used instead of quantiles. The error suggests neither bins nor quantiles have been passed but it is because bins has been passed and quantiles is passed by default with a value of 5 in quantize_factor(), and only one of these two variables can have a value. This then means you must set the value of quantiles to be None. I ran into difficulties in the same place as @eigenfoo where get_clean_factor_and_forward_returns() does not have a default quantiles value but it calls quantize_factor() which does.

ilovecocolade avatar Oct 02 '19 12:10 ilovecocolade

This behaviour was chosen due to backward compatibility, but I agree this is not user friendly. The documentation states that Only one of 'quantiles' or 'bins' can be not-None. Earlier releases of Alphalens had quanties=5 but bins option didn't exist.

I would be happy to change this default and force the user to provide either quantiles or bins (no quanties=5 by default) but I don't know if it would break something? @twiecki

luca-s avatar Oct 02 '19 13:10 luca-s

Yes, I also had that thought that it may break something elsewhere. Maybe just change the error message so it is clear to users that quantiles must be set to None if using bins instead of quantiles. From my limited experience of Alphalens I've found that most of the data I have used is continuous and much better suited to quantile sorting anyway, I only ever use bins when the data is discrete which is rare for me. The best fix may be as a previously said to just change the error message to indicate to users if they want to use the bin instead of quantile sorting they should set quantiles=None.

A suitable error message might be: "Only one of either quantiles or bins should be provided. If using bins instead of quantiles; set quantiles=None."

Thank you for the prompt response.

ilovecocolade avatar Oct 02 '19 13:10 ilovecocolade

@ilovecocolade I like your idea, changing the error message is the safest solution. Would you like to provide a PR?

luca-s avatar Oct 03 '19 08:10 luca-s

Hi,

I have never created a pull request before but I would be happy to do it.

ilovecocolade avatar Oct 03 '19 13:10 ilovecocolade