Skip to main content
Statistics LibreTexts

20.9: Survival analysis

  • Page ID
    45286
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    [Page rough draft]

    Introduction

    As the name suggests, survival analysis is a branch of statistics used to account for death of organisms, or more generally, failure in a system. In general, this kind of analysis models time to event, where event would be death or failure. The basics of the method is defined by the survival function

    \[S_{t} = Pr (T < t) \nonumber\]

    where \(t\) is time, \(T\) is a variable that represents time of death or other end point, and \(Pr\) is probability of an event occurring later than at time \(t\).

    Excellent resources available, series of articles in volume 89 of British Journal of Cancer: Clark et al (2003a), Bradburn et al (2003a), Bradburn et al (2003b), and Clark et al (2003b).

    Hazard function

    Defined as the event rate at time \(t\) based on survival for time times equal to or greater than \(t\).

    Censoring

    Censoring is a a missing data problem typical of survival analysis. Distinguish right-censored and left-censored.

    Kaplan-Meier plot

    Kaplan-Meier (KM) estimator of survival function. Other survival function estimators Fleming-Harrington. The KM estimator, \(\hat{S}_{t}\) is \[\hat{S}(t) = \prod_{i: t_{i} \leq t} \left(1 - \frac{d_{i}}{n_{i}}\right) \nonumber\]

    where \(d_{i}\) is the number of events that occurred at time \(t_{i}\), \(n_{i}\) is the number of individuals known to have survived or not been censored. Because it’s an estimate, a statistic, we need an estimate of the error variance. Several options, the default in R is the Greenwood estimator. \[var \left(\hat{S}(t)\right) = \hat{S}(t)^{2} \sum_{i: t_{i} \leq t} \frac{d_{i}}{n_{i} \left(n_{i} - d_{i}\right)} \nonumber\]

    The KM plot, censoring times noted with plus.

    R code

    Download and install the RcmdrPlugin.survival package.

    Example

    data(heart, package="survival")
    attach(heart)
    #Get help with the data set
    help("heart", package="survival")
    head(heart)
     start stop event        age      year surgery transplant id
    1    0   50     1 -17.155373 0.1232033       0          0  1
    2    0    6     1   3.835729 0.2546201       0          0  2
    3    0    1     0   6.297057 0.2655715       0          0  3
    4    1   16     1   6.297057 0.2655715       0          1  3
    5    0   36     0  -7.737166 0.4900753       0          0  4
    6   36   39     1  -7.737166 0.4900753       0          1  4
    

    Run basic survival analysis. After installing the RcmdrPlugin.survival, from Rcmdr select estimate survival function.

    Statistics dropdown menu in R Commander, with the Survival analysis submenu opened to show the option for Estimate survival function.
    Figure \(\PageIndex{1}\): Screenshot of menu call for survival analysis in Rcmdr.

    Get survival estimator and KM plot (Figure \(\PageIndex{2}\))

    R output:

    .Survfit <- survfit(Surv(start, event) ~ 1, conf.type="log", conf.int=0.95, 
    Rcmdr+ type="kaplan-meier", error="greenwood", data=heart)
    
     .Survfit
    Call: survfit(formula = Surv(start, event) ~ 1, data = heart, error = "greenwood", 
    conf.type = "log", conf.int = 0.95, type = "kaplan-meier")
    
    n events median 0.95LCL 0.95UCL 
    172 75 26 17 37
    
    plot(.Survfit, mark.time=TRUE) 
    
    quantile(.Survfit, quantiles=c(.25,.5,.75))
    $quantile
    25 50 75 
    3 26 67
    
    $lower
    25 50 75 
    1 17 46
    
    $upper
    25 50 75 
    12 37 NA
    
    #by default, Rcmdr removes the object
    remove(.Survfit)
    Kaplan-Meier plot of heart data, showing survival probability vs days. Survival function shown as solid line, with upper and lower confidence intervals shown as dashed lines.
    Figure \(\PageIndex{2}\): Kaplan-Meier plot of heart data. Dashed lines are upper and lower confidence intervals about the survival function.
    Note:

    I modified the plot() code with these additions

    ylim = c(0,1), ylab="Survival probability", xlab="Days", lwd=2, col="blue" 
    

    the data set includes age and whether or not subjects had heart surgery before transplant. Compare.

    Variable surgery is recorded 0,1, so need to create a factor

    fSurgery <- as.factor(surgery)

    Now, to compare

    Rcmdr: Statistics → Survival analysis → Compare survival functions…

    R output

    Rcmdr> survdiff(Surv(start,event) ~ fSurgery, rho=0, data=heart)
    Call:
    survdiff(formula = Surv(start, event) ~ fSurgery, data = heart, 
    rho = 0)
    
                  N Observed Expected (O-E)^2/E (O-E)^2/V
    fSurgery=No 143       66     58.7     0.902      4.56
    fSurgery=Yes 29        9     16.3     3.255      4.56
    
    Chisq= 4.6 on 1 degrees of freedom, p= 0.03

    Get the KM estimator and make a KM plot

    mySurvfit <- survfit(Surv(start, event) ~ surgery, conf.type="log", conf.int=0.95,
    type="kaplan-meier", error="greenwood", data=heart)
    
    plot(mySurvfit, mark.time=TRUE, ylim=c(0,1),lwd=2, col=c("blue","red"), xlab="Number of days", ylab="Survival probability")
    legend("topright", legend = paste(c("Surgery - No", "Surgery - Yes")), col = c("blue", "red"), pch = 19, bty = "n")
    Kaplan-Meier plot of survival functions for heart patients with and without surgery.
    Figure \(\PageIndex{3}\): Kaplan-Meier plot of heart patient survival functions with and without surgery.

    The comparison plot can be made in Rcmdr by selecting our Surgery factor in Strata setting (Fig. \(\PageIndex{4}\)). Recall that strata refers to subgroups of a population.

    Survival Function popup in R Commander, showing the Data tab. The Time or start/end times option has "start" selected, the event indicator option has "event" selected, the strata option has "fSurgery" selected, and the type of censoring selected is Default.
    Figure \(\PageIndex{4}\): Screenshot of Survival estimator menu in Rcmdr.

    Questions

    [pending]


    This page titled 20.9: Survival analysis is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Michael R Dohm via source content that was edited to the style and standards of the LibreTexts platform.