chainladder-python Drop High/Drop Low are not working correctly

I get a different answer depending on whether I fit just the work comp LOB vs all LOBs and then slice work comp. These approaches should yield the same answer, but they don't:

import chainladder as cl
clrd = cl.load_sample('clrd').groupby('LOB')['CumPaidLoss'].sum()
# Passes
assert cl.Development(n_periods=5).fit(clrd).ldf_.loc['wkcomp'] == cl.Development(n_periods=5).fit(clrd.loc['wkcomp']).ldf_
# Fails
assert cl.Development(drop_low=2).fit(clrd).ldf_.loc['wkcomp'] == cl.Development(drop_low=2).fit(clrd.loc['wkcomp']).ldf_

I don't think this ever worked. The reason is that we only have one 2D (origin x development) matrix of the entries to be dropped. This matrix is shared across all triangles. When dropping using n_periods, drop_valutation, drop_origin, etc, this is fine. But dropping based on link-ratio values it is not.

Jun 24 '22 20:06 jbogaardt

Possibly related. I have a triangle with missing values and drop_high drops the missing value rather than ignoring it. My workaround is to list out how many high values to drop, but it is a bit ugly: WtdAvgxHL = cl.Development(average="volume",drop_low=[True]*41,drop_high=[25,23,22,21,20,18,17,15,13,12,10,9,8,6,5,4,3,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]).fit(Triangle).cdf

Sep 30 '22 22:09 hoverat

Thanks for writing in. Yes, this is totally the same issue. I will work on getting this resolved soon.

Sep 30 '22 23:09 kennethshsu

#332

Nov 01 '22 21:11 henrydingliu

It would also be great if when you dropped a valuation AND used drop_high/drop_low it would ignore the valuation you already intended to drop.

Nov 01 '22 21:11 hoverat

In the code, drop and drop_high/drop_low are applied in parallel. (interestingly enough, drop_above/drop_below are applied after.) Not a difficult thing to put them in any order, but I wonder if the better approach is to allow the practitioner to apply the precise order they want via a pipeline. Unfortunately it looks like Development transformers are memoryless at the moment (see example below).

genins = cl.load_sample("genins")
genins_dev_1 =cl.Development(n_periods = 2).fit_transform(genins)
genins_dev_2 =cl.Development(n_periods = 4).fit_transform(genins_dev_1)

Nov 01 '22 22:11 henrydingliu

Using a Pipeline is an interesting pattern. I hadn't thought of it before. In the spirit of keeping estimators as flexible as possible, this seems like a pretty good idea. Then any combo of simultaneous vs ordered omissions could be had.

Nov 02 '22 00:11 jbogaardt

pull request #370 created with fix to support 4D drop hi/lo.

@jbogaardt I did some quick tests. Also ran your assertions from the top (pass on both). seems to work. but it's quite a few line. Would appreciate additional review.

there is more work to be done in this neighborhood, including adding 4D support for drop above/below and the pipeline support. perhaps we can move those to new threads.

Nov 18 '22 19:11 henrydingliu

@henrydingliu, this looks great. I'm all for more lines of code if it makes maintenance easier. We should definitely add the asserts into the test suite so that the same bug isn't introduced at a later date.

Nov 18 '22 20:11 jbogaardt

pull request #375 created with fix to support 4D drop above/below

this assertion fails without this fix

import chainladder as cl
clrd = cl.load_sample('clrd').groupby('LOB')['CumPaidLoss'].sum()
assert cl.Development(drop_above=1.3).fit(clrd).ldf_.loc['wkcomp'] == cl.Development(drop_above=1.3).fit(clrd.loc['wkcomp']).ldf_

Nov 19 '22 06:11 henrydingliu

@jbogaardt thanks for reviewing and approving. do you want to talk through pipeline support architecture on this issue or open a new one?

Nov 19 '22 18:11 henrydingliu

New is probably more appropriate.

Nov 19 '22 18:11 jbogaardt

I was with some non-actuaries this weekend, and I expressed some verbal excitement when I saw this issue is being fixed. Even with their desire to join in, the excitement of a working xHiLow average was not easily transferable. Thanks for doing this!

Nov 21 '22 14:11 hoverat

with 4D support, do we need to modify the existing weight_ property as well?

also, should weight_ be in base instead of development?

Nov 24 '22 00:11 henrydingliu

yes we should

Nov 24 '22 01:11 jbogaardt