ggfortify `n.risk` at `time == 0` is not correct when `nlevels(strata) > 1` in `fortify.survfit(*, surv.connect = TRUE)`

System information

OS Platform and Distribution: Red Hat Enterprise Linux Server 7.8 (Maipo)
ggfortify installed from: CRAN
ggfortify version: ggfortify_0.4.14
Exact command to reproduce: see below

Describe the problem

n.risk at time == 0 is not correct when nlevels(strata) > 1 in fortify.survfit(*, surv.connect = TRUE).

Source code / logs / plots

library(survival)

fit <- survfit(Surv(time, status) ~ x, data = aml)
summary(fit)
# Call: survfit(formula = Surv(time, status) ~ x, data = aml)
# 
#                 x=Maintained 
#  time n.risk n.event survival std.err lower 95% CI upper 95% CI
#     9     11       1    0.909  0.0867       0.7541        1.000
#    13     10       1    0.818  0.1163       0.6192        1.000
#    18      8       1    0.716  0.1397       0.4884        1.000
#    23      7       1    0.614  0.1526       0.3769        0.999
#    31      5       1    0.491  0.1642       0.2549        0.946
#    34      4       1    0.368  0.1627       0.1549        0.875
#    48      2       1    0.184  0.1535       0.0359        0.944
# 
#                 x=Nonmaintained 
#  time n.risk n.event survival std.err lower 95% CI upper 95% CI
#     5     12       2   0.8333  0.1076       0.6470        1.000
#     8     10       2   0.6667  0.1361       0.4468        0.995
#    12      8       1   0.5833  0.1423       0.3616        0.941
#    23      6       1   0.4861  0.1481       0.2675        0.883
#    27      5       1   0.3889  0.1470       0.1854        0.816
#    30      4       1   0.2917  0.1387       0.1148        0.741
#    33      3       1   0.1944  0.1219       0.0569        0.664
#    43      2       1   0.0972  0.0919       0.0153        0.620
#    45      1       1   0.0000     NaN           NA           NA

ggfortify:::fortify.survfit(fit, surv.connect = TRUE) |> head()
#   time n.risk n.event n.censor      surv    std.err     upper     lower        strata
# 1    0     11       0        0 1.0000000 0.00000000 1.0000000 1.0000000    Maintained
# 2    0     11       0        0 1.0000000 0.00000000 1.0000000 1.0000000 Nonmaintained  # <- n.risk should be 12
# 3    9     11       1        0 0.9090909 0.09534626 1.0000000 0.7541338    Maintained
# 4   13     10       1        1 0.8181818 0.14213381 1.0000000 0.6192490    Maintained
# 5   18      8       1        0 0.7159091 0.19508758 1.0000000 0.4884263    Maintained
# 6   23      7       1        0 0.6136364 0.24873417 0.9991576 0.3768671    Maintained

The issue is from: https://github.com/sinhrks/ggfortify/blob/5ed95240075a7322bd910402d0a1d259f0e98f02/R/fortify_surv.R#L65

Maybe it can be changed to d[d$time == ave(d$time, d$strata, FUN = min), ].

And then this will need to be simplified:

https://github.com/sinhrks/ggfortify/blob/5ed95240075a7322bd910402d0a1d259f0e98f02/R/fortify_surv.R#L73-L79

Mar 08 '24 13:03 ThomasSoeiro

Hi @terrytangyuan, Do you confirm the issue? Would you consider a patch based on what I proposed in https://github.com/sinhrks/ggfortify/issues/229#issue-2176062346? Thanks!

Jun 07 '24 21:06 ThomasSoeiro

Please submit a PR with additional test cases. Thanks!

Jun 11 '24 00:06 terrytangyuan

FYI, I've just realized that a much better fix would be to use summary(times = 0):

library(survival)
fit <- survfit(Surv(time, status) ~ x, aml)
summary(fit, times = 0)
# Call: survfit(formula = Surv(time, status) ~ x, data = aml)
# 
#                 x=Maintained 
#         time       n.risk      n.event     survival      std.err lower 95% CI upper 95% CI 
#            0           11            0            1            0            1            1 
# 
#                 x=Nonmaintained 
#         time       n.risk      n.event     survival      std.err lower 95% CI upper 95% CI 
#            0           12            0            1            0            1            1 
#

Edit: survival:::summary.survfit() has recently gained a data.frame argument (since https://github.com/therneau/survival/commit/4a5c24e85f9e560a391d1145fcf0e37b40bf7b14 available in {survival} 3.7-0 release on 2024-06-05). fortify.survfit() may probably use that and be simplified once {survival} 3.7-0 is considered old enough.

Also note that the standard errors return by survival::survfit() and survival:::summary.survfit() are different:

from survival::survfit():

for a survival curve this contains standard error of the cumulative hazard or -log(survival), for a multi-state curve it contains the standard error of prev.
from survival:::summary.survfit():

the standard error of the survival value.

# get the fix from https://github.com/sinhrks/ggfortify/pull/230
remotes::install_github("sinhrks/ggfortify@5de6565")

library(survival)
fit <- survfit(Surv(time, status) ~ x, aml)

x <- ggfortify:::fortify.survfit(fit, surv.connect = TRUE)

y <- rbind(
  summary(fit, times = 0, data.frame = TRUE),
  summary(fit, censored = TRUE, data.frame = TRUE)
)
levels(y$strata) <- sub("x=", "", levels(y$strata), fixed = TRUE)

# from https://github.com/cran/survival/blob/36b56a5d8014d3a5ed29fb111303955e3e28f13f/R/summary.survfit.R#L187-L190
y$std.err <- y$std.err / y$surv
y[is.na(y$std.err), "std.err"] <- Inf

all.equal(x, y[names(x)])
# [1] TRUE

Jan 24 '25 23:01 ThomasSoeiro

For the record, an even better fix would be to use the dedicated survfit0() function, as in plot.survfit(). According to ?survfit0(), the function:

Add the point for a starting time ("time 0") to a survfit object's elements. This is useful for plotting.

# get the fix from https://github.com/sinhrks/ggfortify/pull/230
remotes::install_github("sinhrks/ggfortify@5de6565")

library(survival)
fit <- survfit(Surv(time, status) ~ x, aml)

x <- ggfortify:::fortify.survfit(fit, surv.connect = TRUE)
x <- x[order(x$strata, x$time), ]

y <- survfit0(fit)
y <- summary(y, censored = TRUE, data.frame = TRUE)
levels(y$strata) <- sub("x=", "", levels(y$strata), fixed = TRUE)

# from https://github.com/cran/survival/blob/36b56a5d8014d3a5ed29fb111303955e3e28f13f/R/summary.survfit.R#L187-L190
y$std.err <- y$std.err / y$surv
y[is.na(y$std.err), "std.err"] <- Inf

all.equal(x, y[names(x)], check.attributes = FALSE)
# [1] TRUE

Mar 12 '25 13:03 ThomasSoeiro