Refactor geometric mean computation
Current implementation is exp(mean(log(x)))) negative values gives NAs, should we also consider the case of negative values with prod(sign(x))?
See wiki for example
was wondering what your thoughts on this? @qiliu1013 @feljam @yangx61 @insightsengineering/nest-sme Thanks!
Hi @shajoezhu I think this is a replication of the issue #284 . Do you want me to close that issue or this one?
Hi @shajoezhu I think this is a replication of the issue #284 . Do you want me to close that issue or this one?
Hi @yli110-stat697 , yes it is related. In fact, also relate to this issue https://github.com/insightsengineering/tern/issues/13.
The current implementation focus on positive values, it is the sign of the geometric mean, I am not too sure what to do with it.
I wonder if we can do positive geometric mean, and absolute geometric mean, and what's the impact to the real use case. @yangx61 @danielinteractive @waddella just wondering what you think as well? Thanks!
@shajoezhu Geometric mean is not defined when we have negative values right? (at least when we don't want to go into complex numbers)
To me this is a question on the semantics of the outer function where this statistic is provided. E.g. do you want to assert somewhere that all values are positive? Or do you explicitly say that this can always be calculated and then you document what you do with negative vlaues?
There is interesting discussion with some potential refinements also here: https://stackoverflow.com/questions/2602583/geometric-mean-is-there-a-built-in
@shajoezhu Geometric mean is not defined when we have negative values right? (at least when we don't want to go into complex numbers)
To me this is a question on the semantics of the outer function where this statistic is provided. E.g. do you want to assert somewhere that all values are positive? Or do you explicitly say that this can always be calculated and then you document what you do with negative vlaues?
There is interesting discussion with some potential refinements also here: https://stackoverflow.com/questions/2602583/geometric-mean-is-there-a-built-in
Thanks @danielinteractive ! I agree! Mathematically, geometric mean is not well defined. From this wiki section , I think it also allows us to work around.
I wonder in practice, do we have negative values? I guess that can be applied to change from baseline? @qiliu1013 @feljam , was wondering what's your experience on this?
I am thinking the following steps:
- check with stream, and see what the current template does. Looking for
- check for implementation, and test with non-negative values, test for number matching
- check data if negative value case exists
- If negative value cases exists, see how stream does it, we will match it for now
- If no negative values, result matches between R and stream, close issue
We don't derive change from baseline for PK data. PK is a concentration amount at given time, it can't be negative. It is either 0 or positive.
It is a good idea to check how stream behaves when deriving GeomMean with values equal to 0. My suggestion would be to simply ignore 0 and keep only positive values before deriving GeomMean @qiliu1013, what is your experience with this when producing PK TLGs?
Hi,
PK data is an specific case, but we have a general s_summary.numeric() function that can be used with any type of numeric values (i.e negative values).
For example, the user can mistakenly use s_summary.numeric() and ask for a geom mean by using change from baseline as analysis variable (it contains negative values).
- Currently, we are ignoring/removing all
xi <= 0, and then calculating the geom mean. Maybe we could also add a warning message saying "Ignoring all values <=0 to calculate the geom mean". - Do not let the user go for a geom mean in case we have negative values and throw an explanatory error message. In this case, 0 values will be ignored as we want geom mean to be calculated with PK concentration data (it contains 0 values)
@shajoezhu @danielinteractive @feljam @qiliu1013