chainladder-python icon indicating copy to clipboard operation
chainladder-python copied to clipboard

Unpredictable behaviour of drop_high

Open henrydingliu opened this issue 3 years ago • 1 comments

drop_high does some weird stuff sometimes

prism_df = pd.read_csv(
    "https://raw.githubusercontent.com/casact/chainladder-python/master/chainladder/utils/data/prism.csv"
)
prism_df["AccidentDate"] = pd.to_datetime(prism_df["AccidentDate"])
prism_df["PaymentDate"] = pd.to_datetime(prism_df["PaymentDate"])
prism_df.loc[(prism_df["AccidentDate"].dt.year > 2008) & (prism_df["PaymentDate"].dt.year > 2012),"Incurred"] = 0
prism = cl.Triangle(
    data=prism_df,
    origin="AccidentDate",
    development="PaymentDate",
    columns="Incurred",
    cumulative=False,
)
prism = prism.grain("OYDY").incr_to_cum()
prism_dev = cl.Development(drop_high = 1, drop_low = 1).fit_transform(prism)
print(prism.age_to_age)
print(prism_dev.age_to_age)

drop_high seems to not work on 12-24, 24-36, 36-48, and 48-60,

henrydingliu avatar Aug 28 '22 23:08 henrydingliu

Ahh, this is because it's not a "triangle", it's a trapezoid in this case. This is indeed a bug. In fact, drop_low doesn't work property as well.

kennethshsu avatar Aug 29 '22 03:08 kennethshsu

@henrydingliu, did you resolve this? I can take a look at this one.

kennethshsu avatar Nov 17 '22 05:11 kennethshsu

i believe so, yes

henrydingliu avatar Nov 17 '22 05:11 henrydingliu

I don't think I see it working correctly, can you check again, even just using the code above?

kennethshsu avatar Nov 17 '22 06:11 kennethshsu

it didn't work for me when i did pip install chainladder --pre on my existing vm. however, when i created a brand new vm, directly cloned https://github.com/casact/chainladder-python.git, and pip installed locally, it seemed to work. nb attached.

Untitled.zip

henrydingliu avatar Nov 17 '22 06:11 henrydingliu

tried pip install chainladder --pre again after uninstalling and clearing cache. still installing 0.8.13

henrydingliu avatar Nov 17 '22 06:11 henrydingliu

My mistake, I was in the wrong environment and got confused. I see the changes now.

While I was playing with it, I can think of another improvement for this. Consider a scenario, where drop_low = 1 is set, but ther are multple periods with the same LDF rank, which one should be dropped? For example:

AY 2015 $200 -> $200 AY 2016 $500 -> $500

For both origins, the LDFs are 1.000, but we need to drop the lowest value. It matters which one we drop because the volume weighted LDF will result in different answers. For now, the code will drop the first one (AY 2015), which I think is ok since most of the time the books grow.

Do you think this is worth implementing? If not, we can close this ticket for now.

kennethshsu avatar Nov 17 '22 23:11 kennethshsu

So in your example, the desired default behavior would be to drop AY 2015 for the $200?

henrydingliu avatar Nov 17 '22 23:11 henrydingliu

Yes, I think that makes sense.

kennethshsu avatar Nov 17 '22 23:11 kennethshsu

I think it does the trick to replace this with

np.lexsort((X[0][0][:-1,:-1],lr[0][0]),axis = 0).argsort(axis=0)

I think I've seen more robust way to trim singletons off of ends than [:-1,:-1]. Any suggestions?

henrydingliu avatar Nov 18 '22 03:11 henrydingliu

I'm nowhere near that level lol, if it works, it's good enough for me. 😂

Maybe @jbogaardt knows.

kennethshsu avatar Nov 18 '22 04:11 kennethshsu

did some more testing. this works.

link_ratio_ranks = np.lexsort((X.values[0][0][:,:-1],link_ratio[0][0]),axis = 0).argsort(axis=0)

I'm gonna spend a bit more time figuring out how to generalize this in order to fix #309 as well

henrydingliu avatar Nov 18 '22 05:11 henrydingliu

Ok cool, thank you!

kennethshsu avatar Nov 18 '22 05:11 kennethshsu

pull request #370 created with fix. more comments on main thread #309

henrydingliu avatar Nov 18 '22 19:11 henrydingliu

Reopening until release of 0.8.14.

jbogaardt avatar Nov 20 '22 00:11 jbogaardt