chainladder-python Unpredictable behaviour of drop

drop_high does some weird stuff sometimes

prism_df = pd.read_csv(
    "https://raw.githubusercontent.com/casact/chainladder-python/master/chainladder/utils/data/prism.csv"
)
prism_df["AccidentDate"] = pd.to_datetime(prism_df["AccidentDate"])
prism_df["PaymentDate"] = pd.to_datetime(prism_df["PaymentDate"])
prism_df.loc[(prism_df["AccidentDate"].dt.year > 2008) & (prism_df["PaymentDate"].dt.year > 2012),"Incurred"] = 0
prism = cl.Triangle(
    data=prism_df,
    origin="AccidentDate",
    development="PaymentDate",
    columns="Incurred",
    cumulative=False,
)
prism = prism.grain("OYDY").incr_to_cum()
prism_dev = cl.Development(drop_high = 1, drop_low = 1).fit_transform(prism)
print(prism.age_to_age)
print(prism_dev.age_to_age)

drop_high seems to not work on 12-24, 24-36, 36-48, and 48-60,

Aug 28 '22 23:08 henrydingliu

Ahh, this is because it's not a "triangle", it's a trapezoid in this case. This is indeed a bug. In fact, drop_low doesn't work property as well.

Aug 29 '22 03:08 kennethshsu

@henrydingliu, did you resolve this? I can take a look at this one.

Nov 17 '22 05:11 kennethshsu

i believe so, yes

Nov 17 '22 05:11 henrydingliu

I don't think I see it working correctly, can you check again, even just using the code above?

Nov 17 '22 06:11 kennethshsu

it didn't work for me when i did pip install chainladder --pre on my existing vm. however, when i created a brand new vm, directly cloned https://github.com/casact/chainladder-python.git, and pip installed locally, it seemed to work. nb attached.

Untitled.zip

Nov 17 '22 06:11 henrydingliu

tried pip install chainladder --pre again after uninstalling and clearing cache. still installing 0.8.13

Nov 17 '22 06:11 henrydingliu

My mistake, I was in the wrong environment and got confused. I see the changes now.

While I was playing with it, I can think of another improvement for this. Consider a scenario, where drop_low = 1 is set, but ther are multple periods with the same LDF rank, which one should be dropped? For example:

AY 2015 $200 -> $200 AY 2016 $500 -> $500

For both origins, the LDFs are 1.000, but we need to drop the lowest value. It matters which one we drop because the volume weighted LDF will result in different answers. For now, the code will drop the first one (AY 2015), which I think is ok since most of the time the books grow.

Do you think this is worth implementing? If not, we can close this ticket for now.

Nov 17 '22 23:11 kennethshsu

So in your example, the desired default behavior would be to drop AY 2015 for the $200?

Nov 17 '22 23:11 henrydingliu

Yes, I think that makes sense.

Nov 17 '22 23:11 kennethshsu

I think it does the trick to replace this with

np.lexsort((X[0][0][:-1,:-1],lr[0][0]),axis = 0).argsort(axis=0)

I think I've seen more robust way to trim singletons off of ends than [:-1,:-1]. Any suggestions?

Nov 18 '22 03:11 henrydingliu

I'm nowhere near that level lol, if it works, it's good enough for me. 😂

Maybe @jbogaardt knows.

Nov 18 '22 04:11 kennethshsu

did some more testing. this works.

link_ratio_ranks = np.lexsort((X.values[0][0][:,:-1],link_ratio[0][0]),axis = 0).argsort(axis=0)

I'm gonna spend a bit more time figuring out how to generalize this in order to fix #309 as well

Nov 18 '22 05:11 henrydingliu

Ok cool, thank you!

Nov 18 '22 05:11 kennethshsu

pull request #370 created with fix. more comments on main thread #309

Nov 18 '22 19:11 henrydingliu

Reopening until release of 0.8.14.

Nov 20 '22 00:11 jbogaardt

Unpredictable behaviour of drop_high