CALT estimated standard error in AAL overstates observed sampling error
Issue Description
Advice from stats gurus would be very welcome on this problem.
The standard error of the AAL estimate in the new report seems to overstate the observed sampling error for a given sample size. Using PiWind, 10 locations, a bootstrap of AAL calculated 100 times with 10 samples produces a standard deviation of 0.6%, versus estimated standard error of 7.8%. While this is great news for the user, it means the CALT (Convergence in Average Loss Table) report is pretty useless as a predictive tool for AAL convergence.
I think the issue is the violation of the i.i.d assumption, in particular the identically distributed assumption. Each year loss observation comes from a particular period that has particular events which have different loss variation. The bigger the event, the bigger the variation in loss. At the other end of the spectrum, we have 2/3 of periods with no events and zero loss variation. This represents a case of extreme heteroscedasticity.
With a bit of googling I have found some methods that correct for model misspecification / iid violation.
https://stat-analysis.netlify.app/the-iid-violation-and-robust-standard-errors.html
Further investigation is needed to improve the estimated standard error and make this report useful.
Steps to Reproduce (Bugs only)
- run piwind with 1000 samples with ord output. include alct output in analysis settings "alct_convergence": true
- using the gul_S1_splt, calculate AAL for each 10 sample subset. produce 100 AAL estimates
- Find the 0.975 and 0.025 quantiles from the 100 AAL observations corresponding to the 95% confidence interval
- take the standard deviation of the AAL estimates
- compare this value with the standard error from the 10 sample run, which can be found in the new gul_S1_alct report.
Version / Environment information
1.26