CALT estimated standard error in AAL overstates observed sampling error

Open johcarter opened this issue 3 years ago • 0 comments

Issue Description

Advice from stats gurus would be very welcome on this problem.

The standard error of the AAL estimate in the new report seems to overstate the observed sampling error for a given sample size. Using PiWind, 10 locations, a bootstrap of AAL calculated 100 times with 10 samples produces a standard deviation of 0.6%, versus estimated standard error of 7.8%. While this is great news for the user, it means the CALT (Convergence in Average Loss Table) report is pretty useless as a predictive tool for AAL convergence.

I think the issue is the violation of the i.i.d assumption, in particular the identically distributed assumption. Each year loss observation comes from a particular period that has particular events which have different loss variation. The bigger the event, the bigger the variation in loss. At the other end of the spectrum, we have 2/3 of periods with no events and zero loss variation. This represents a case of extreme heteroscedasticity.

With a bit of googling I have found some methods that correct for model misspecification / iid violation.
https://stat-analysis.netlify.app/the-iid-violation-and-robust-standard-errors.html

Further investigation is needed to improve the estimated standard error and make this report useful.

Steps to Reproduce (Bugs only)

run piwind with 1000 samples with ord output. include alct output in analysis settings "alct_convergence": true
using the gul_S1_splt, calculate AAL for each 10 sample subset. produce 100 AAL estimates
Find the 0.975 and 0.025 quantiles from the 100 AAL observations corresponding to the 95% confidence interval
take the standard deviation of the AAL estimates
compare this value with the standard error from the 10 sample run, which can be found in the new gul_S1_alct report.

Version / Environment information

1.26

Example data / logs

Jun 15 '22 15:06 johcarter