`n.risk` at `time == 0` is not correct when `nlevels(strata) > 1` in `fortify.survfit(*, surv.connect = TRUE)`
System information
- OS Platform and Distribution: Red Hat Enterprise Linux Server 7.8 (Maipo)
- ggfortify installed from: CRAN
- ggfortify version: ggfortify_0.4.14
- Exact command to reproduce: see below
Describe the problem
n.risk at time == 0 is not correct when nlevels(strata) > 1 in fortify.survfit(*, surv.connect = TRUE).
Source code / logs / plots
library(survival)
fit <- survfit(Surv(time, status) ~ x, data = aml)
summary(fit)
# Call: survfit(formula = Surv(time, status) ~ x, data = aml)
#
# x=Maintained
# time n.risk n.event survival std.err lower 95% CI upper 95% CI
# 9 11 1 0.909 0.0867 0.7541 1.000
# 13 10 1 0.818 0.1163 0.6192 1.000
# 18 8 1 0.716 0.1397 0.4884 1.000
# 23 7 1 0.614 0.1526 0.3769 0.999
# 31 5 1 0.491 0.1642 0.2549 0.946
# 34 4 1 0.368 0.1627 0.1549 0.875
# 48 2 1 0.184 0.1535 0.0359 0.944
#
# x=Nonmaintained
# time n.risk n.event survival std.err lower 95% CI upper 95% CI
# 5 12 2 0.8333 0.1076 0.6470 1.000
# 8 10 2 0.6667 0.1361 0.4468 0.995
# 12 8 1 0.5833 0.1423 0.3616 0.941
# 23 6 1 0.4861 0.1481 0.2675 0.883
# 27 5 1 0.3889 0.1470 0.1854 0.816
# 30 4 1 0.2917 0.1387 0.1148 0.741
# 33 3 1 0.1944 0.1219 0.0569 0.664
# 43 2 1 0.0972 0.0919 0.0153 0.620
# 45 1 1 0.0000 NaN NA NA
ggfortify:::fortify.survfit(fit, surv.connect = TRUE) |> head()
# time n.risk n.event n.censor surv std.err upper lower strata
# 1 0 11 0 0 1.0000000 0.00000000 1.0000000 1.0000000 Maintained
# 2 0 11 0 0 1.0000000 0.00000000 1.0000000 1.0000000 Nonmaintained # <- n.risk should be 12
# 3 9 11 1 0 0.9090909 0.09534626 1.0000000 0.7541338 Maintained
# 4 13 10 1 1 0.8181818 0.14213381 1.0000000 0.6192490 Maintained
# 5 18 8 1 0 0.7159091 0.19508758 1.0000000 0.4884263 Maintained
# 6 23 7 1 0 0.6136364 0.24873417 0.9991576 0.3768671 Maintained
The issue is from: https://github.com/sinhrks/ggfortify/blob/5ed95240075a7322bd910402d0a1d259f0e98f02/R/fortify_surv.R#L65
Maybe it can be changed to d[d$time == ave(d$time, d$strata, FUN = min), ].
And then this will need to be simplified:
https://github.com/sinhrks/ggfortify/blob/5ed95240075a7322bd910402d0a1d259f0e98f02/R/fortify_surv.R#L73-L79
Hi @terrytangyuan, Do you confirm the issue? Would you consider a patch based on what I proposed in https://github.com/sinhrks/ggfortify/issues/229#issue-2176062346? Thanks!
Please submit a PR with additional test cases. Thanks!
FYI, I've just realized that a much better fix would be to use summary(times = 0):
library(survival)
fit <- survfit(Surv(time, status) ~ x, aml)
summary(fit, times = 0)
# Call: survfit(formula = Surv(time, status) ~ x, data = aml)
#
# x=Maintained
# time n.risk n.event survival std.err lower 95% CI upper 95% CI
# 0 11 0 1 0 1 1
#
# x=Nonmaintained
# time n.risk n.event survival std.err lower 95% CI upper 95% CI
# 0 12 0 1 0 1 1
#
Edit: survival:::summary.survfit() has recently gained a data.frame argument (since https://github.com/therneau/survival/commit/4a5c24e85f9e560a391d1145fcf0e37b40bf7b14 available in {survival} 3.7-0 release on 2024-06-05). fortify.survfit() may probably use that and be simplified once {survival} 3.7-0 is considered old enough.
Also note that the standard errors return by survival::survfit() and survival:::summary.survfit() are different:
- from
survival::survfit():for a survival curve this contains standard error of the cumulative hazard or -log(survival), for a multi-state curve it contains the standard error of prev.
- from
survival:::summary.survfit():the standard error of the survival value.
# get the fix from https://github.com/sinhrks/ggfortify/pull/230
remotes::install_github("sinhrks/ggfortify@5de6565")
library(survival)
fit <- survfit(Surv(time, status) ~ x, aml)
x <- ggfortify:::fortify.survfit(fit, surv.connect = TRUE)
y <- rbind(
summary(fit, times = 0, data.frame = TRUE),
summary(fit, censored = TRUE, data.frame = TRUE)
)
levels(y$strata) <- sub("x=", "", levels(y$strata), fixed = TRUE)
# from https://github.com/cran/survival/blob/36b56a5d8014d3a5ed29fb111303955e3e28f13f/R/summary.survfit.R#L187-L190
y$std.err <- y$std.err / y$surv
y[is.na(y$std.err), "std.err"] <- Inf
all.equal(x, y[names(x)])
# [1] TRUE
For the record, an even better fix would be to use the dedicated survfit0() function, as in plot.survfit(). According to ?survfit0(), the function:
Add the point for a starting time ("time 0") to a survfit object's elements. This is useful for plotting.
# get the fix from https://github.com/sinhrks/ggfortify/pull/230
remotes::install_github("sinhrks/ggfortify@5de6565")
library(survival)
fit <- survfit(Surv(time, status) ~ x, aml)
x <- ggfortify:::fortify.survfit(fit, surv.connect = TRUE)
x <- x[order(x$strata, x$time), ]
y <- survfit0(fit)
y <- summary(y, censored = TRUE, data.frame = TRUE)
levels(y$strata) <- sub("x=", "", levels(y$strata), fixed = TRUE)
# from https://github.com/cran/survival/blob/36b56a5d8014d3a5ed29fb111303955e3e28f13f/R/summary.survfit.R#L187-L190
y$std.err <- y$std.err / y$surv
y[is.na(y$std.err), "std.err"] <- Inf
all.equal(x, y[names(x)], check.attributes = FALSE)
# [1] TRUE