ss3-source-code icon indicating copy to clipboard operation
ss3-source-code copied to clipboard

Standardize terminology: morph/growth pattern/platoon, and standard error/std dev/CV

Open k-doering-NOAA opened this issue 5 years ago • 6 comments

Imported from redmine, Issue #66091 Opened by @k-doering-NOAA on 2019-07-16 Status when imported: On Hold

There are some inconsistencies in how terms are used. I didn't see this issue yet in the tracker, but please let me know if I missed it and this is a duplicate.

Email from Rick about morph/ growth pattern/ platoon:

I started working on something with Chantel and realized how misleading the following section of control.ss_new is:
1  #_N_Growth_Patterns
1 #_N_platoons_Within_GrowthPattern 
#_Cond 1 #_Morph_between/within_stdev_ratio (no read if N_morphs=1)
#_Cond  1 #vector_Morphdist_(-1_in_first_val_gives_normal_approx)

It should be:
1  #_N_Morphs (e.g. Growth_Patterns)
1 #_N_platoons_Within_Morphs 
#_Cond 1 #_Platoon_between/within_stdev_ratio (no read if N_platoons=1)
#_Cond  1 #vector_of_platoon_dist_(-1_in_first_val_gives_normal_approx)

In addition, I've noticed that standard error/standard deviation/CV are used interchangeably in the SS input files, which is confusing. For example, in the variance adjustment factors section of the control file, the additive survey input variance is labeled "survey CV", but the manual states that it is actually a log-scale standard deviation. Using consistent and correct terminology with regards to these values would be helpful to users. It would also be good to make sure it is clear when inputs are on a nominal or log scale, too. Users incorrectly assuming the wrong scale, values, or units can lead to model misspecification.

One idea to help with term standardization is to have an "official" SS glossary somewhere (for example, we could create this in the redmine wiki). Here we can specify technical terms and what they mean. We could also specify synonomous terms for the time being and which one is preferable to use.

If there are more inconsistently used terms, perhaps we can add them onto this issue or any ideas you have for doing this.

k-doering-NOAA avatar Nov 05 '20 17:11 k-doering-NOAA

comment from @iantaylor-NOAA on 2019-07-16: I know that I'm sloppy in my use of standard error/std dev/CV. Part of the confusion comes from the fact that for lognormally distributed variables the input is a standard error on the log scale which is very similar to a CV over the range of values we typically see. The "Error Distribution" section under "Indices" in the SS User Manual includes a formula to convert between them which shows that at CV = 0.2 the logSE ~ 0.198.

My vague understanding of SE vs SD is that the SE is the appropriate term to describe the variability in estimates of the mean of a distribution, where the SD might describe the variability of the distribution itself. If that's correct, the use of SE is appropriate for index inputs. I see SE in many places, but "extraSD", "extra_se" and "extra sd" all in use in control.ss_new. Someone with a more recent or more complete statistics education could be consulted on this.

k-doering-NOAA avatar Nov 05 '20 17:11 k-doering-NOAA

comment from @k-doering-NOAA on 2019-07-16: Thanks, Ian - that jives with my understanding SE/SD/CV (which is admittedly shallow). Perhaps SD/SE/CV then is less of a problem than the morph/growth pattern/platoon because most users will be using lognormal distributions, but if it is possible to standardize with one term and be precise, I think it would be helpful. I as a user would probably assume that there is an intentional reason for using a different term, even if the reality was just using synonomous (or approximately correct) terms interchangably.

k-doering-NOAA avatar Nov 05 '20 17:11 k-doering-NOAA

comment from @iantaylor-NOAA on 2019-07-16: A good strategy might be to update this Google Doc https://docs.google.com/spreadsheets/d/1es49JxMLvavsFrzZw8HmNRFJ6NYSdtBodRirUQ65t5M/edit#gid=0 (last edited in 2016 before many of the suggested changes were implemented) or replace it with a newer one that lists areas that could benefit from a cleanup in terminology. A Google Doc could work better than comments on this issue for aggregating the list of changes.

Also note that a list of some of the previous changes in terminology can be found in the comments on completed issue #22185.

k-doering-NOAA avatar Nov 05 '20 17:11 k-doering-NOAA

comment from @k-doering-NOAA on 2019-07-22: Thanks, Ian, this info is helpful. I assigned myself to this and will look more into this in the future (perhaps in time for a 3.30.15 release?) I also changed the priority of this to "low", as you have convinced me it is not that critical (anyone should feel free to revise if they think otherwise)

k-doering-NOAA avatar Nov 05 '20 17:11 k-doering-NOAA

comment from @k-doering-NOAA on 2019-10-24: Closing this, as the spreadsheet Ian linked has a really through glossary, including the terms brought up in this issue. I a note about the SD/SE/CV issue in the spreadsheet.

k-doering-NOAA avatar Nov 05 '20 17:11 k-doering-NOAA

comment from @k-doering-NOAA on 2019-12-16: I'm reopening this issue, based on a slightly more specific discussion about terminology, as it pertains to discard rates and mean body rates. See below paragraphs, which was copied over from a google doc. I will leave it as "on hold", as I don't think anyone is actively working on this issue.

This file contains notes from discussion between @kelli.johnson, @chantel.wetzel, and @ian.taylor on checking the SS code for Discard Rates and Mean Body Weights, with a focus on the treatment of the uncertainty (discussion occured ~ May 22, 2019)

Steps to take Figure out what's actually happening with SD vs CV Based on review of TPL files, for mean body weight, the input really does seem to be a CV and the variance adjustment is also in the CV units for discard rates, the treatment of the uncertainty input depends on the discard_errtype setting, with options #_discard_errtype: >0 for DF of T-dist(read CV below); 0 for normal with CV; -1 for normal with se; -2 for lognormal -3 for trunc normal with CV TPL has note “variance adjustment is to the sd, not the CV” which indeed seems correct Document these things in the User Manual Fix any internal inconsistencies in SS, such as in comments of Report or .ss_new files. In “Input_Variance_Adjustment” of Report.sso, “Discard_extra_CV” should be “Discard_extra_SD” (note that this is correct in control.ss_new) DISCARD_OUTPUT has separate columns for Std_in and Std_use, where the latter include the variance adjustment. However, the MEAN_BODY_WT_OUTPUT has only the one “CV” column so it’s unclear whether this already includes the variance adjustment or not. It appears from model results that this CV column is CV_input + added CV. CV could be replaced with CV_use and a new column of CV_in could be added to match the DISCARD_OUTPUT. In the future, try to augment SS to have internally estimated parameters for both discard rates and mean body weight observations just like the Q_extraSD parameter for indices

k-doering-NOAA avatar Nov 05 '20 17:11 k-doering-NOAA