add option to include constant of integration for Dirichlet-Multinomial likelihood
Imported from redmine, Issue #51972 Opened by @iantaylor-NOAA on 2018-06-22 Status when imported: On Hold
Jim Thorson just noted in an email to me that constant of integration for the Dirichlet-Multinomial is not included in the likelihood. This isn't an issue unless you wanted to do a comparison of the likelihoods between the D-M and the multinomial. In theory, the switch
1 #_sd_offset; must be 1 if any growthCV, sigmaR, or survey extraSD is an estimated parameter
could be used to turn on or off this constant term.
This is a low priority but I'm posting here to remind me to look into this in the future as time allows.
Piggybacking on this issue to suggest that I also check the calculation of the Beta value associated with D-M option 2 to see if there's an extra multiplication by n in lines such as this one: https://github.com/nmfs-stock-synthesis/stock-synthesis/blob/8c73e5cba1e6ce9355806076e98149db1d27207a/SS_write_report.tpl#L1568.
This should be in commit 4abf1e3 to the issues_31_and_128 branch. I want to manually confirm outside the model that the likelihood now matches the full equations (10) or (4) in Thorson et al. 2017 before closing the issue.
I don't think I broke anything, but will try testing the branch (to confirm this change and the one noted in issue #31) are working alright and then we can merge via Pull Request. In the time since I started this work, @Rick-Methot-NOAA added parallel (but not actually conflicting) edits to SS_objfunc.tpl. I think the Pull Request will sort them out.
@k-doering-NOAA I'd like to try the Jenkins build and model comparisons on the issues_31_and_128 branch, but can't figure out how. I thought there was a simple switch, but maybe that was for GitHub Actions? The Hake_2018 and Hake_2019_semi-parametric_selex and any other model using Dirichlet-Multinomial likelihood should see a change in likelihood compared to the reference models.
@iantaylor-NOAA I started a run on that branch. There is a switch, but unfortunately things are a little complicated because jenkins is still pulling from vlab, so branches have to be pushed there first. Once our github is public jenkins can pull directly from github.
This guide shows how to set up the job, but doesn't mention the part about needing to push a branch to vlab first as I put it together a while ago.
Jenkins ran the models without estimation and using the reference par file and found changes in the log likelihood for the 2 version of hake. I'm assuming this is expected since a constant of integration for the DM likelihood changed and I think these are the only 2 models in the test that that use DM ?
Thanks @k-doering-NOAA. Good to hear I didn't break anything for the other models and that the 2 hake models changed as expected.
Unfortunately, the amount of change wasn't quite right. I tried to replicate the likelihood calculation outside the model for a single age comp vector and unfortunately got a different answer. I had not realized that the constant of integration terms for the standard multinomial were added in SS_prelim.tpl. That section has now been updated for the D-M likelihood cases in commit 155ce38652c281c06b8384b1808a8dd41d380048. Now my external calculations of D-M likelihoods as described in the Thorson et al. paper perfectly match what is reported by SS for both CompError = 1 and the alternative (relatively unused) parameterization associated with CompError = 2.
There is one remaining issue, which is a warning about ambiguous code associated with the addition to SS_prelim.tpl (warning pasted at the bottom of this comment). @Rick-Methot-NOAA can you help figure out what the problem is? The model seems to run fine in spite of the warning, so we could merge the issues_31_and_128 branch first if that helps you not worrying about switching back and forth between branches.
Here's a checklist to help make sure this issue gets wrapped up appropriately:
- [x] update code to include the constant of integration for D-M likelihoods first term in equations (4) and (10) in Thorson, J.T., Johnson, K.F., Methot, R.D. and Taylor, I.G. 2017. Model-based estimates of effective sample size in stock assessment models using the Dirichlet-multinomial distribution. Fisheries Research192: 84-93. https://doi.org/10.1016/j.fishres.2016.06.005
- [x] confirm that the D-M calculation in SS produces values that match the equations in the Thorson et al. paper.
- [x] make the
warning: ISO C++ says that these are ambiguous...warning go away (Rick?) - [x] merge
issues_31_and_128branch intomain(Rick?) - [x] warn users (via release notes?) that age and length comp likelihoods that use the D-M likelihood will change but expected values and estimated parameters should all remain the same (Kathryn?)
- [x] update the
ss_summary.ssoreference files for the two hake models in ss_examples (Kathryn?)
*** Compile: ss.cpp
g++ -c -std=c++11 -O3 -fpermissive -D_FILE_OFFSET_BITS=64 -DUSE_ADMB_CONTRIBS -D_USE_MATH_DEFINES -I. -I"C:\admb\admb-12.2\include" -I"C:\admb\admb-12.2\include\contrib" -o ss.obj ss.cpp
ss.cpp: In member function 'void model_parameters::preliminary_calculations()':
ss.cpp:10115:77: warning: ISO C++ says that these are ambiguous, even though the worst conversion for the first is better than the worst conversion for the second:
sum(gammln(1 + nsamp_l(f,i)*obs_l(f,i)(tails_l(f,i,3),tails_l(f,i,4))));
^
In file included from C:\admb\admb-12.2\include/admodel.h:59:0,
from ss.cpp:7:
C:\admb\admb-12.2\include/fvar.hpp:1583:9: note: candidate 1: dvector operator+(double, const dvector&)
dvector operator+(double x, const dvector & t1);
^
In file included from C:\admb\admb-12.2\include/fvar.hpp:632:0,
from C:\admb\admb-12.2\include/admodel.h:59,
from ss.cpp:7:
C:\admb\admb-12.2\include/ivector.h:215:9: note: candidate 2: ivector operator+(int, const ivector&)
ivector operator+(int v, const ivector& w);
^
ss.cpp:10273:77: warning: ISO C++ says that these are ambiguous, even though the worst conversion for the first is better than the worst conversion for the second:
sum(gammln(1 + nsamp_a(f,i)*obs_a(f,i)(tails_a(f,i,3),tails_a(f,i,4))));
^
FYI, I tried to assign this to @Rick-Methot-NOAA, @k-doering-NOAA and myself, but it turns out that for free accounts like the nmfs-stock-synthesis, organization, only public repositories can have multiple issue assignee (as noted here: https://github.com/pricing).
good catch @iantaylor-NOAA, and I can certainly do the last 2 items on the list!
Let me know if you need any more runs on jenkins, I can start them for you as needed .
one more thought.... maybe "merge issues_31_and_128 branch into main" could be done as a pull request that either Rick or Ian opens for the other to check?)
fixed. the type conversion was just too complex for gammln() Changed:
-
now does 1 sex vs 2 sex
-
intermediate contant calculated
-
constant "1" is an integer, so changed to 1.0
if(gen_l(f,i) !=2) { int z1=tails_l(f,i,1); int z2=tails_l(f,i,2); offset_l(f,i) += gammln(nsamp_l(f,i) + 1.) - // sum(gammln(1. + nsamp_l(f,i)*obs_l(f,i)(tails_l(f,i,3),tails_l(f,i,4)))); sum(gammln(1. + nsamp_l(f,i)*obs_l(f,i)(z1,z2))); } if(gen_l(f,i) >=2 && gender==2) { int z1=tails_l(f,i,3); int z2=tails_l(f,i,4); offset_l(f,i) += gammln(nsamp_l(f,i) + 1.) - // sum(gammln(1. + nsamp_l(f,i)*obs_l(f,i)(tails_l(f,i,3),tails_l(f,i,4)))); sum(gammln(1. + nsamp_l(f,i)*obs_l(f,i)(z1,z2))); }
Thanks @Rick-Methot-NOAA for figuring that warning out. It would surely have taken me a lot longer to figure this out. I never think about the 1.0 vs 1 issue but clearly it can make a difference.
@k-doering-NOAA, following your suggestion, I created a Pull Request (#132) to merge this branch, which is more transparent as it notifies watchers of the repository of the change. I don't think there's likely to be any issues that further testing would find, so this could likely just get merged at any time, or is there some further check that we should take, like running the GitHub actions to confirm that the code changes compile on Linux and Mac systems?
Thanks for the help @Rick-Methot-NOAA and @k-doering-NOAA. All boxes are checked, so I'm closing this now (noting that it's listed under "3.30.17 release issues" even after closing to help us keep track of changes included in the next release).