Inquiry about Bootphreg with stratification: extracting stratum-specific cumulative baseline hazard
Dear Dr. Holst,
I am reaching out to you to inquire about the Bootphreg function when used with stratification.
When using strata, e.g.:
b <- Bootphreg(Surv(time, status==1) ~ trt + strata(celltype), data=veteran, B=1000),
I see that the output b contains, for each of the B bootstraps, a wild bootstrapped estimate of the baseline cumulative hazard function.
My understanding is that the latter is calculated in this section of the Bootphreg01 function:
cumhaz <- cbind(jumptimes,cumsumstrata(1/val$S0,strata,nstrata)).
I was wondering whether it would be possible (and whether you deem it mathematically sensible) to extract, for each of the B bootstraps, the stratum-specific estimate of the baseline cumulative hazard function. To achieve this, after carefully studying the code, I tried to modify the Bootphreg01 function as follows:
Snipped of original Bootphreg01 function:
{ ... {....
cumhaz <- cbind(jumptimes,cumsumstrata(1/val$S0,strata,nstrata))
colnames(cumhaz) <- c("time","cumhaz")
res[[i]] <- list(coef=cc,cumhaz=cumhaz[,2])
}
names(res) <- 1:B
return(res)
}
Snipped of the modified Bootphreg01:
{ ... {....
cumhaz <- cbind(jumptimes,cumsumstrata(1/val$S0,strata,nstrata))
colnames(cumhaz) <- c("time","cumhaz")
cumhaz_strata <- list()
for (k in 1:nstrata){
strata_idx <- strata == k - 1
cumhaz_strata[[k]] <-cbind(jumptimes[strata_idx], cumsum(1/val$S0[strata_idx]))
}
res[[i]] <- list(coef=cc,cumhaz=cumhaz[,2], cumhaz_strata = cumhaz_strata)
}
names(res) <- 1:B
return(res)
}
With this modification, the output b <- Bootphreg(Surv(time, status==1) ~ trt + strata(celltype), data=veteran, B=1000) contains, for each bootstrap B, in addition to coef and cumhaz, a list cumhaz_strata of length corresponding to the number of strata (i.e., length 4 in this example), with (ideally) the stratum-specific baseline cumulative hazard estimates.
The output seems in line with what I wanted to achieve, but I am wondering, since I am not an expert in the field and I do not know how the functions cumsum and cumsumstrata behave in the background, if you deem it correct. And if you think it is correct, I was also wondering whether it would be possible to add this possibility, i.e., to extract the stratum-specific cumulative hazard estimates, to the original Bootphreg function in the package.
I remain at your disposal for any questions/clarifications.
I hope to hear back from you and many thanks in advance.
Best regards,
Alessandra
Dear Allessandra,
You are absolutely correct, the baseline is calculated as you describe, and can be extracted and made into strata specific objects as you do. In reality there is, however, no need to recompute the strata specific objects. the cumsumstrata is doing cumulative sums over strata. so you could also just select each strata component from the combined cumhaz that contains all substrata.
best regards
Thomas
Thomas Scheike On Thu, 21 Sep 2023, agaiasaracini wrote:
Dear Dr. Holst,
I am reaching out to you to inquire about the
Bootphregfunction when used with stratification.When using strata, e.g.:
b <- Bootphreg(Surv(time, status==1) ~ trt + strata(celltype), data=veteran, B=1000),I see that the output
bcontains, for each of the B bootstraps, a wild bootstrapped estimate of the baseline cumulative hazard function.My understanding is that the latter is calculated in this section of the
Bootphreg01function:
cumhaz <- cbind(jumptimes,cumsumstrata(1/val$S0,strata,nstrata)).I was wondering whether it would be possible (and whether you deem it mathematically sensible) to extract, for each of the B bootstraps, the stratum-specific estimate of the baseline cumulative hazard function. To achieve this, after carefully studying the code, I tried to modify the
Bootphreg01function as follows:Snipped of original
Bootphreg01function:
{ ... {....cumhaz <- cbind(jumptimes,cumsumstrata(1/val$S0,strata,nstrata))colnames(cumhaz) <- c("time","cumhaz")res[[i]] <- list(coef=cc,cumhaz=cumhaz[,2])}names(res) <- 1:Breturn(res)}Snipped of the modified
Bootphreg01:
{ ... {....cumhaz <- cbind(jumptimes,cumsumstrata(1/val$S0,strata,nstrata))colnames(cumhaz) <- c("time","cumhaz")cumhaz_strata <- list()for (k in 1:nstrata){strata_idx <- strata == k - 1cumhaz_strata[[k]] <-cbind(jumptimes[strata_idx], cumsum(1/val$S0[strata_idx]))}res[[i]] <- list(coef=cc,cumhaz=cumhaz[,2], cumhaz_strata = cumhaz_strata)}names(res) <- 1:Breturn(res)}With this modification, the output
b <- Bootphreg(Surv(time, status==1) ~ trt + strata(celltype), data=veteran, B=1000)contains, for each bootstrap B, in addition tocoefandcumhaz, a listcumhaz_strataof length corresponding to the number of strata (i.e., length 4 in this example), with (ideally) the stratum-specific baseline cumulative hazard estimates.The output seems in line with what I wanted to achieve, but I am wondering, since I am not an expert in the field and I do not know how the functions
cumsumandcumsumstratabehave in the background, if you deem it correct. And if you think it is correct, I was also wondering whether it would be possible to add this possibility, i.e., to extract the stratum-specific cumulative hazard estimates, to the originalBootphregfunction in the package.I remain at your disposal for any questions/clarifications.
I hope to hear back from you and many thanks in advance.
Best regards,
Alessandra
-- Reply to this email directly or view it on GitHub: https://github.com/kkholst/mets/issues/7 You are receiving this because you are subscribed to this thread.
Message ID: @.***>
Dear Professor Scheike,
Thank you so much for your reply and clarification!
Kind regards,
Alessandra
Dear Prof Scheike,
I would have a follow-up inquiry regarding the above.
I see how I could extract the stratum-specific cumulative baseline hazards from the combined cumulative hazard output of Bootphreg function - e.g., by ordering the original data (which contains the strata information) by event times, and "matching" it with the values of the hazard from the output of Bootphreg.
However, I am not sure know how to uniquely identify the cases in which there are ties for the event times across different strata. e.g. in this case (where status=1 is an event):
Subject Time Status Stratum 1--------10---1------1 2--------10---1------1 3--------10---1------2
It is my understanding that the output of Bootphreg includes 3 cumulative hazard values for the above event times - is there a way to distinguish which stratum these belong to?
I am inquiring about this specifically as I would like to use the stratum-specific cumulative hazards for a certain project I am working on.
Many thanks for your patience!
Kind regards,
Alessandra
Dear Allessandra,
Good point ! It is assumed that there are no ties among the event times in the current implementation.
best regards
Thomas
On Tue, 17 Oct 2023, agaiasaracini wrote:
Dear Prof Scheike,
I would have a follow-up inquiry regarding the above.
I see how I could extract the stratum-specific cumulative baseline hazards from the combined cumulative hazard output of Bootphreg function - e.g., by ordering the original data (which contains the strata information) by event times, and "matching" it with the values of the hazard from the output of Bootphreg.
However, I am not sure know how to uniquely identify the cases in which there are ties for the event times across different strata. e.g. in this case (where status=1 is an event):
Subject Time Status Stratum 1 10 1 1 2 10 1 1 3 10 1 2
It is my understanding that the output of Bootphreg includes 3 cumulative hazard values for the above event times - is there a way to distinguish which stratum these belong to?
I am inquiring about this specifically as I would like to use the stratum-specific cumulative hazards for a certain project I am working on.
Many thanks for your patience!
Kind regards,
Alessandra
-- Reply to this email directly or view it on GitHub: https://github.com/kkholst/mets/issues/7#issuecomment-1766884483 You are receiving this because you commented.
Message ID: @.***>