brms.mmrm Informative prior archetypes

This pull request lays the groundwork for informative prior archetypes. It fixes #96 and #99, and adds a citation for https://opensource.nibr.com/bamdd/src/02h_mmrm.html.

@andrew-bean, @chstock, and @yonicd, would you please have a look? The new vignettes/archetypes.Rmd vignette has the key features.

May 02 '24 19:05 wlandau

Odd, the pull_request workflows are using test code that no longer exists: https://github.com/openpharma/brms.mmrm/actions/runs/8929506656/job/24527447569#step:6:164. I don't believe those failures.

May 02 '24 20:05 wlandau

Odd, the pull_request workflows are using test code that no longer exists: https://github.com/openpharma/brms.mmrm/actions/runs/8929506656/job/24527447569#step:6:164. I don't believe those failures.

Ah, it's because of Git merging shenanigans. I merged main locally and was able to reproduce and fix the testing errors.

May 02 '24 20:05 wlandau

Just added brm_archetype_cells() for cell means, brm_archetype_effects() for treatment effects, and brm_archetype_successive_effects() for the treatment effect version of successive differences. So this PR now has all archetypes we talked about except for the average across time.

May 03 '24 20:05 wlandau

Not sure why GitHub Actions is having trouble building the package.

May 03 '24 20:05 wlandau

By the way, I am going on vacation for a week and a day. I will begin catching up on feedback etc. on Tuesday May 14.

May 03 '24 20:05 wlandau

In the absence of informative priors, the models agree with each other. And when nuisance variables don't explain much variation in the data, those models agree with data summaries.

library(brms.mmrm)
library(dplyr)

set.seed(0L)
d1 <- brm_simulate_outline(
  n_group = 2,
  n_patient = 100,
  n_time = 4,
  rate_dropout = 0,
  rate_lapse = 0
) |>
  dplyr::mutate(response = rnorm(n = dplyr::n())) |>
  brm_simulate_continuous(names = c("biomarker1", "biomarker2")) |>
  brm_simulate_categorical(
    names = c("status1", "status2"),
    levels = c("present", "absent")
  )
d2 <- brm_archetype_cells(d1)
d3 <- brm_archetype_effects(d1)
d4 <- brm_archetype_successive_cells(d1)
d5 <- brm_archetype_successive_effects(d1)

f1 <- brm_formula(d1)
f2 <- brm_formula(d2)

m1 <- brm_model(data = d1, formula = f1)
m2 <- brm_model(data = d2, formula = f2)
m3 <- brm_model(data = d3, formula = f2)
m4 <- brm_model(data = d4, formula = f2)
m5 <- brm_model(data = d5, formula = f2)

draws1 <- brm_marginal_draws(d1, f1, m1)
draws2 <- brm_marginal_draws(d2, f2, m2)
draws3 <- brm_marginal_draws(d3, f2, m3)
draws4 <- brm_marginal_draws(d4, f2, m4)
draws5 <- brm_marginal_draws(d5, f2, m5)

s0 <- brm_marginal_data(d1)
s1 <- brm_marginal_summaries(draws1)
s2 <- brm_marginal_summaries(draws2)
s3 <- brm_marginal_summaries(draws3)
s4 <- brm_marginal_summaries(draws4)
s5 <- brm_marginal_summaries(draws5)

brm_plot_compare(
  s0 = s0,
  s1 = s1,
  s2 = s2,
  s3 = s3,
  s4 = s4,
  s5 = s5
)

compare

May 04 '24 02:05 wlandau

And the models do use the priors:

library(brms.mmrm)
library(dplyr)

set.seed(0L)
data <- brm_simulate_outline(
  n_group = 2,
  n_patient = 100,
  n_time = 4,
  rate_dropout = 0,
  rate_lapse = 0
) |>
  dplyr::mutate(response = rnorm(n = dplyr::n())) |>
  brm_data_change() |>
  brm_simulate_continuous(names = c("biomarker1", "biomarker2")) |>
  brm_simulate_categorical(
    names = c("status1", "status2"),
    levels = c("present", "absent")
  )
archetype <- brm_archetype_successive_cells(data)

prior <- brm_prior_label(
  code = "normal(10, 0.2)",
  group = "group_1",
  time = "time_2"
) |>
  brm_prior_label("normal(10, 0.1)", group = "group_1", time = "time_3") |>
  brm_prior_label("normal(10, 0.4)", group = "group_1", time = "time_4") |>
  brm_prior_label("normal(20, 0.2)", group = "group_2", time = "time_2") |>
  brm_prior_label("normal(20, 0.3)", group = "group_2", time = "time_3") |>
  brm_prior_label("normal(20, 0.4)", group = "group_2", time = "time_4") |>
  brm_prior_archetype(archetype = archetype)

formula <- brm_formula(archetype)

model_informative <- brm_model(
  data = archetype,
  formula = formula,
  prior = prior
)
model_noninformative <- brm_model(
  data = archetype,
  formula = formula
)

draws_informative <- brm_marginal_draws(
  data = archetype,
  formula = formula,
  model = model_informative
)
draws_noninformative <- brm_marginal_draws(
  data = archetype,
  formula = formula,
  model = model_noninformative
)

summaries_data <- brm_marginal_data(data)
summaries_informative <- brm_marginal_summaries(draws_informative)
summaries_noninformative <- brm_marginal_summaries(draws_noninformative)

brm_plot_compare(
  data = summaries_data,
  informative = summaries_informative,
  noninformative = summaries_noninformative
)

priors

May 04 '24 03:05 wlandau

@yonicd, I added archetypes to parameterize the average across time within each treatment group. Using the FEV data:

library(brms.mmrm)
library(dplyr)
data("fev_data", package = "mmrm")
data <- brm_data(
  data = fev_data,
  outcome = "FEV1",
  role = "response",
  group = "ARMCD",
  time = "AVISIT",
  reference_group = "PBO",
  reference_time = "VIS1"
)

brm_archetype_average_cells() is the cell means version. Below, x_PBO_VIS1 is actually the average across time for PBO, and x_TRT_VIS1 is the same for TRT. The labels still include "VIS1" so the labeling scheme remains consistent from archetype to archetype and brm_prior_label()/brm_prior_archetype() can still set informative priors.

cells <- brm_archetype_average_cells(data)
summary(cells)
# This object is an informative prior archetype in brms.mmrm.
# The fixed effect parameters of interest express the
# marginal means as follows (on the link scale):
# 
#    PBO:VIS1 = 4*x_PBO_VIS1 - x_PBO_VIS2 - x_PBO_VIS3 - x_PBO_VIS4
#    PBO:VIS2 = x_PBO_VIS2
#    PBO:VIS3 = x_PBO_VIS3
#    PBO:VIS4 = x_PBO_VIS4
#    TRT:VIS1 = 4*x_TRT_VIS1 - x_TRT_VIS2 - x_TRT_VIS3 - x_TRT_VIS4
#    TRT:VIS2 = x_TRT_VIS2
#    TRT:VIS3 = x_TRT_VIS3
#    TRT:VIS4 = x_TRT_VIS4

Above, (PBO:VIS1 + PBO:VIS2 + PBO:VIS3 + PBO:VIS4) / 4 simplifies to x_PBO_VIS1, and (TRT:VIS1 + TRT:VIS2 + TRT:VIS3 + TRT:VIS4) / 4 simplifies to x_TRT_VIS1.

I also threw in a treatment effect version:

effects <- brm_archetype_average_effects(data)
summary(effects)
# This object is an informative prior archetype in brms.mmrm.
# The fixed effect parameters of interest express the
# marginal means as follows (on the link scale):
# 
#    PBO:VIS1 = 4*x_PBO_VIS1 - x_PBO_VIS2 - x_PBO_VIS3 - x_PBO_VIS4
#    PBO:VIS2 = x_PBO_VIS2
#    PBO:VIS3 = x_PBO_VIS3
#    PBO:VIS4 = x_PBO_VIS4
#    TRT:VIS1 = 4*x_PBO_VIS1 - x_PBO_VIS2 - x_PBO_VIS3 - x_PBO_VIS4 + 4*x_TRT_VIS1 - x_TRT_VIS2 - x_TRT_VIS3 - x_TRT_VIS4
#    TRT:VIS2 = x_PBO_VIS2 + x_TRT_VIS2
#    TRT:VIS3 = x_PBO_VIS3 + x_TRT_VIS3
#    TRT:VIS4 = x_PBO_VIS4 + x_TRT_VIS4

Above, x_PBO_VIS1 is still the average placebo response over all time points, but now x_TRT_VIS1 is the average treatment effect. In other words, x_TRT_VIS1 is the result of (TRT:VIS1 + TRT:VIS2 + TRT:VIS3 + TRT:VIS4) / 4 - (PBO:VIS1 + PBO:VIS2 + PBO:VIS3 + PBO:VIS4) / 4.

This brings us up to 6 different archetypes:

brm_archetype_average_cells()
brm_archetype_average_effects()
brm_archetype_cells()
brm_archetype_effects()
brm_archetype_successive_cells()
brm_archetype_successive_effects()

I believe this completes this PR, unless others have comments (please let me know). Archetypes are cheap, and it is easy for me to add more.

May 16 '24 15:05 wlandau

I also added a section to vignettes/archetypes.Rmd to explain the other archetypes.

May 16 '24 15:05 wlandau

One thing I kept thinking about: we regularly use MMRMs with baseline:time and treatment:time interactions. Elicitation seems to be more challenging then (due to the second interaction). Which archetype would I use, or how would a I construct a suitable one? Apologies if this is already covered and I missed it.

Is there usually a reason to set informative priors on baseline:time interactions? I would think these are nuisance parameters and solely used to explain noise. The current approach in the archetype interface is to manually compute the model matrix columns for these interactions and center them at their means. (Centering protects the interpretation of the parameters of interest against subtle reference level issues like the one from https://github.com/openpharma/brms.mmrm/discussions/24#discussioncomment-5928853.)

As for treatment:time, the three current *_cells() archetypes implicitly cover treatment:time interactions in different ways, but only in a simplified context where the precise meaning of each parameter is clear. There is currently no archetype for intercept + treatment + time + treatment:time, and I think it may be hard for users because the precise meaning of treatment:time depends on how the contrasts drop model matrix columns and which combination of levels is chosen as the reference level. For example, it's a bit hard to think of what an informative prior should be for treatment2:time2 when this parameter is defined relative to intercept + time2 + treatment2, and even this needs to take into account whether time1/treatment1 or time_N/treatment_N is dropped from the model matrix.

May 20 '24 13:05 wlandau

One thing I kept thinking about: we regularly use MMRMs with baseline:time and treatment:time interactions. Elicitation seems to be more challenging then (due to the second interaction). Which archetype would I use, or how would a I construct a suitable one? Apologies if this is already covered and I missed it.

Is there usually a reason to set informative priors on baseline:time interactions? I would think these are nuisance parameters and solely used to explain noise. The current approach in the archetype interface is to manually compute the model matrix columns for these interactions and center them at their means. (Centering protects the interpretation of the parameters of interest against subtle reference level issues like the one from #24 (comment).)

As for treatment:time, the three current *_cells() archetypes implicitly cover treatment:time interactions in different ways, but only in a simplified context where the precise meaning of each parameter is clear. There is currently no archetype for intercept + treatment + time + treatment:time, and I think it may be hard for users because the precise meaning of treatment:time depends on how the contrasts drop model matrix columns and which combination of levels is chosen as the reference level. For example, it's a bit hard to think of what an informative prior should be for treatment2:time2 when this parameter is defined relative to intercept + time2 + treatment2, and even this needs to take into account whether time1/treatment1 or time_N/treatment_N is dropped from the model matrix.

Thanks, I can see how this would complicate things. In some situations, in my experience, baseline has been essentially the only known or the strongest prognostic variable (it often seems to correlate strongly with other (un)observered prognostic variables). So it often has a special role among the nuisance variable, I believe. Evidence might exist from natural history studies or (more frequently and increasingly) from historical trial data to which MMRMs with baseline x time interactions had been fit. Fully fine if we do not address this at the moment.

May 21 '24 05:05 chstock

You do make a good case for borrowing information on baseline:time interactions in some circumstances. And since these parameters are time-specific slopes, the centering I described above should not interfere with interpretation or priors. (brms.mmrm does not scale these columns of the model matrix.)

Using brms::set_prior(), it is possible to assign informative priors on the other parameters in the model, including baseline:time interactions. The only extra challenge at the user level is identifying the names of those parameters, possibly using a manual call to brms::get_prior(). This should already work smoothly for highly customized/specialized analyses. My question is whether it is worth extending brms.mmrm::brm_prior_label() etc. to make it extra convenient to set priors. This would require a challenging redesign at both the interface and infrastructure levels.

May 22 '24 18:05 wlandau

My question is whether it is worth extending brms.mmrm::brm_prior_label() etc. to make it extra convenient to set priors. This would require a challenging redesign at both the interface and infrastructure levels.

I'd be fine if we wait and see whether substantial demand for such custom priors comes up. Maybe the case with the baseline:time interaction could be handled in an additional vignette. I could take this on, if that is ok for you. We can also discuss this in one of the next meetings.

May 23 '24 07:05 chstock

Maybe the case with the baseline:time interaction could be handled in an additional vignette.

https://github.com/openpharma/brms.mmrm/pull/100/commits/a7b912671e003ef136ff2e547dd1d4b807446b66 adds this to the "Informative priors" section of the existing archetypes.Rmd vignette.

I could take this on, if that is ok for you. We can also discuss this in one of the next meetings.

Thanks, it would be great to discuss.

archetypes.Rmd is very software-focused. It accomplishes its goals as a software usage tutorial, but it uses simulated data, and the priors have no justification. It may help to complement this with a realistic case study in a different vignette which uses an archetype with a clinically meaningful set of informative priors.

May 24 '24 15:05 wlandau