Content - 838c3682e2e34875e79592e6ac56fbd0a27f0106 - 7a73b4d/vignettes/Introduction.Rnw

Introduction.Rnw
%\VignetteIndexEntry{Introduction to the rstpm2 Package}
%\VignetteDepends{rstpm2}
%\VignetteKeyword{survival, spline}
%\VignettePackage{rstpm2}
%!\SweaveUTF8

\documentclass[nojss]{jss}

\usepackage{amsmath,amsfonts,enumitem}
\usepackage[utf8]{inputenc}

\title{Introduction to the \pkg{rstpm2} package}

\author{Mark~Clements\\Karolinska Institutet}

\Plainauthor{Mark~Clements}

\Plaintitle{Introduction to the rstpm2 package}

\Abstract{
  
  This vignette outlines the methods and provides some examples for
  link-based survival models as implemented in the \proglang{R}
  \pkg{rstpm2} package.

}

\Keywords{survival, splines}

\Plainkeywords{survival, splines}

\Address{Mark~Clements\\
  Department of Medical Epidemiology and Biostatistics\\
  Karolinska Institutet\\
  Email: \email{mark.clements@ki.se}
  }

\begin{document}

\section{Background and theory}

\emph{Link-based survival models} provide a flexible and general approach to modelling survival or time-to-event data. The survival function $S(t|x)$ to time $t$ for covariates $x$ is defined in terms of a link function $G$ and a linear prediction $\eta(t,x)$, such that

\begin{align*}
  S(t|x) &= G(\eta(t,x))
\end{align*}

\noindent where $\eta$ is a function of both time $t$ and covariates $x$. The linear predictor can be constructed in a flexible manner. Royston and Parmar (2003) focused on time being modelled using natural splines for log-time, including left truncation and relative survival. We have implemented the Royston-Parmar model class and extended it in several ways, allowing for: (i) general parametric models for $\eta(t,x)$, including B-splines and natural splines for different transformations of time; (ii) general semi-parametric models for $\eta(t,x)$ including penalised smoothers together with unpenalised parametric functions; (iii) interval censoring; and (iv) frailties using Gamma and log-Normal distributions. Fully parametric models are estimated using maximum likelihood, while the semi-parametric models are estimated using maximum penalised likelihood with smoothing parameters selected using 
A more detailed theoretical development is available from the paper by Liu, Pawitan and Clements (available on request). 

Why would you want to use these models? 


\section{Mean survival}

This has a useful interpretation for causal inference.

$E_Z(S(t|Z,X=1))-E_Z(S(t|Z,X=0))$

\begin{verbatim}
fit <- stpm2(...)
predict(fit,type="meansurv",newdata=data)
\end{verbatim}

\section{Cure models}

For cure, we use the melanoma dataset used by Andersson and colleagues
for cure models with Stata's stpm2 (see
\url{http://www.pauldickman.com/survival/}).

Initially, we merge the patient data with the all cause mortality rates.

<<echo=FALSE,results=hide>>=
options(width=80,useFancyQuotes="UTF-8")
require(rstpm2)
@ 
<<>>=
popmort2 <- transform(rstpm2::popmort,exitage=age,exityear=year,age=NULL,year=NULL)
colon2 <- within(rstpm2::colon, {
  status <- ifelse(surv_mm>120.5,1,status)
  tm <- pmin(surv_mm,120.5)/12
  exit <- dx+tm*365.25
  sex <- as.numeric(sex)
  exitage <- pmin(floor(age+tm),99)
  exityear <- floor(yydx+tm)
  ##year8594 <- (year8594=="Diagnosed 85-94")
})
colon2 <- merge(colon2,popmort2)
@ 
For comparisons, we fit the relative survival model without and with cure. 
%% <<results=hide>>=
<<>>=
fit0 <- stpm2(Surv(tm,status %in% 2:3)~I(year8594=="Diagnosed 85-94"),
              data=colon2,
              bhazard=colon2$rate, df=5)
@ 
<<>>=
summary(fit <- stpm2(Surv(tm,status %in% 2:3)~I(year8594=="Diagnosed 85-94"),
                     data=colon2,
                     bhazard=colon2$rate,
                     df=5,cure=TRUE))
predict(fit,head(colon2),se.fit=TRUE)
@ 
The estimate for the year parameter from the model without cure is within three significant
figures with that in Stata. For the predictions, the Stata model gives:
\begin{verbatim}
     +---------------------------------+
     |      surv   surv_lci   surv_uci |
     |---------------------------------|
  1. | .86108264   .8542898   .8675839 |
  2. | .79346526   .7850106   .8016309 |
  3. | .69674037   .6863196   .7068927 |
  4. | .86108264   .8542898   .8675839 |
  5. | .82212425   .8143227   .8296332 |
     |---------------------------------|
  6. | .86108264   .8542898   .8675839 |
     +---------------------------------+
\end{verbatim}
We can estimate the proportion of failures prior to the last event time:
<<>>=
newdata.eof <- data.frame(year8594 = unique(colon2$year8594),
                          tm=10)
1-predict(fit0, newdata.eof, type="surv", se.fit=TRUE)
1-predict(fit, newdata.eof, type="surv", se.fit=TRUE)
predict(fit, newdata.eof, type="haz", se.fit=TRUE)
@ 
We can plot the predicted survival estimates:
\begin{center}
<<fig=TRUE>>=
tms=seq(0,10,length=301)[-1]
plot(fit0,newdata=data.frame(year8594 = "Diagnosed 85-94", tm=tms), ylim=0:1,
     xlab="Time since diagnosis (years)", ylab="Relative survival")
plot(fit0,newdata=data.frame(year8594 = "Diagnosed 75-84",tm=tms),
     add=TRUE,line.col="red",rug=FALSE)
## warnings: Predicted hazards less than zero for cure
plot(fit,newdata=data.frame(year8594 = "Diagnosed 85-94",tm=tms),
     add=TRUE,ci=FALSE,lty=2,rug=FALSE)
plot(fit,newdata=data.frame(year8594="Diagnosed 75-84",tm=tms),
     add=TRUE,rug=FALSE,line.col="red",ci=FALSE,lty=2)
legend("topright",c("85-94 without cure","75-84 without cure",
                    "85-94 with cure","75-84 with cure"),
       col=c(1,2,1,2), lty=c(1,1,2,2), bty="n")
@
\end{center}

And the hazard curves:

\begin{center}
<<fig=TRUE>>=
plot(fit0,newdata=data.frame(year8594 = "Diagnosed 85-94", tm=tms), 
     ylim=c(0,0.5), type="hazard",
     xlab="Time since diagnosis (years)",ylab="Excess hazard")
plot(fit0,newdata=data.frame(year8594 = "Diagnosed 75-84", tm=tms),
     type="hazard",
     add=TRUE,line.col="red",rug=FALSE)
plot(fit,newdata=data.frame(year8594 = "Diagnosed 85-94", tm=tms),
     type="hazard",
     add=TRUE,ci=FALSE,lty=2,rug=FALSE)
plot(fit,newdata=data.frame(year8594="Diagnosed 75-84", tm=tms),
     type="hazard",
     add=TRUE,rug=FALSE,line.col="red",ci=FALSE,lty=2)
legend("topright",c("85-94 without cure","75-84 without cure",
                    "85-94 with cure","75-84 with cure"),
       col=c(1,2,1,2), lty=c(1,1,2,2), bty="n")
@
\end{center}

\end{document}