Unpredictable behaviour of drop_high
drop_high does some weird stuff sometimes
prism_df = pd.read_csv(
"https://raw.githubusercontent.com/casact/chainladder-python/master/chainladder/utils/data/prism.csv"
)
prism_df["AccidentDate"] = pd.to_datetime(prism_df["AccidentDate"])
prism_df["PaymentDate"] = pd.to_datetime(prism_df["PaymentDate"])
prism_df.loc[(prism_df["AccidentDate"].dt.year > 2008) & (prism_df["PaymentDate"].dt.year > 2012),"Incurred"] = 0
prism = cl.Triangle(
data=prism_df,
origin="AccidentDate",
development="PaymentDate",
columns="Incurred",
cumulative=False,
)
prism = prism.grain("OYDY").incr_to_cum()
prism_dev = cl.Development(drop_high = 1, drop_low = 1).fit_transform(prism)
print(prism.age_to_age)
print(prism_dev.age_to_age)
drop_high seems to not work on 12-24, 24-36, 36-48, and 48-60,
Ahh, this is because it's not a "triangle", it's a trapezoid in this case. This is indeed a bug. In fact, drop_low doesn't work property as well.
@henrydingliu, did you resolve this? I can take a look at this one.
i believe so, yes
I don't think I see it working correctly, can you check again, even just using the code above?
it didn't work for me when i did pip install chainladder --pre on my existing vm. however, when i created a brand new vm, directly cloned https://github.com/casact/chainladder-python.git, and pip installed locally, it seemed to work. nb attached.
tried pip install chainladder --pre again after uninstalling and clearing cache. still installing 0.8.13
My mistake, I was in the wrong environment and got confused. I see the changes now.
While I was playing with it, I can think of another improvement for this. Consider a scenario, where drop_low = 1 is set, but ther are multple periods with the same LDF rank, which one should be dropped? For example:
AY 2015 $200 -> $200 AY 2016 $500 -> $500
For both origins, the LDFs are 1.000, but we need to drop the lowest value. It matters which one we drop because the volume weighted LDF will result in different answers. For now, the code will drop the first one (AY 2015), which I think is ok since most of the time the books grow.
Do you think this is worth implementing? If not, we can close this ticket for now.
So in your example, the desired default behavior would be to drop AY 2015 for the $200?
Yes, I think that makes sense.
I think it does the trick to replace this with
np.lexsort((X[0][0][:-1,:-1],lr[0][0]),axis = 0).argsort(axis=0)
I think I've seen more robust way to trim singletons off of ends than [:-1,:-1]. Any suggestions?
I'm nowhere near that level lol, if it works, it's good enough for me. 😂
Maybe @jbogaardt knows.
did some more testing. this works.
link_ratio_ranks = np.lexsort((X.values[0][0][:,:-1],link_ratio[0][0]),axis = 0).argsort(axis=0)
I'm gonna spend a bit more time figuring out how to generalize this in order to fix #309 as well
Ok cool, thank you!
pull request #370 created with fix. more comments on main thread #309
Reopening until release of 0.8.14.