%\VignetteIndexEntry{Introduction to the rstpm2 Package} %\VignetteDepends{rstpm2} %\VignetteKeyword{survival, spline} %\VignettePackage{rstpm2} %!\SweaveUTF8 \documentclass[nojss]{jss} \usepackage{amsmath,amsfonts,enumitem} \usepackage[utf8]{inputenc} \title{Introduction to the \pkg{rstpm2} package} \author{Mark~Clements\\Karolinska Institutet} \Plainauthor{Mark~Clements} \Plaintitle{Introduction to the rstpm2 package} \Abstract{ This vignette outlines the methods and provides some examples for link-based survival models as implemented in the \proglang{R} \pkg{rstpm2} package. } \Keywords{survival, splines} \Plainkeywords{survival, splines} \Address{Mark~Clements\\ Department of Medical Epidemiology and Biostatistics\\ Karolinska Institutet\\ Email: \email{mark.clements@ki.se} } \begin{document} \section{Background and theory} \emph{Link-based survival models} provide a flexible and general approach to modelling survival or time-to-event data. The survival function $S(t|x)$ to time $t$ for covariates $x$ is defined in terms of a link function $G$ and a linear prediction $\eta(t,x)$, such that \begin{align*} S(t|x) &= G(\eta(t,x)) \end{align*} \noindent where $\eta$ is a function of both time $t$ and covariates $x$. The linear predictor can be constructed in a flexible manner. Royston and Parmar (2003) focused on time being modelled using natural splines for log-time, including left truncation and relative survival. We have implemented the Royston-Parmar model class and extended it in several ways, allowing for: (i) general parametric models for $\eta(t,x)$, including B-splines and natural splines for different transformations of time; (ii) general semi-parametric models for $\eta(t,x)$ including penalised smoothers together with unpenalised parametric functions; (iii) interval censoring; and (iv) frailties using Gamma and log-Normal distributions. Fully parametric models are estimated using maximum likelihood, while the semi-parametric models are estimated using maximum penalised likelihood with smoothing parameters selected using A more detailed theoretical development is available from the paper by Liu, Pawitan and Clements (available on request). Why would you want to use these models? \section{Mean survival} This has a useful interpretation for causal inference. $E_Z(S(t|Z,X=1))-E_Z(S(t|Z,X=0))$ \begin{verbatim} fit <- stpm2(...) predict(fit,type="meansurv",newdata=data) \end{verbatim} \section{Cure models} For cure, we use the melanoma dataset used by Andersson and colleagues for cure models with Stata's stpm2 (see \url{http://www.pauldickman.com/survival/}). Initially, we merge the patient data with the all cause mortality rates. <>= options(width=80,useFancyQuotes="UTF-8") require(rstpm2) @ <<>>= popmort2 <- transform(rstpm2::popmort,exitage=age,exityear=year,age=NULL,year=NULL) colon2 <- within(rstpm2::colon, { status <- ifelse(surv_mm>120.5,1,status) tm <- pmin(surv_mm,120.5)/12 exit <- dx+tm*365.25 sex <- as.numeric(sex) exitage <- pmin(floor(age+tm),99) exityear <- floor(yydx+tm) ##year8594 <- (year8594=="Diagnosed 85-94") }) colon2 <- merge(colon2,popmort2) @ For comparisons, we fit the relative survival model without and with cure. %% <>= <<>>= fit0 <- stpm2(Surv(tm,status %in% 2:3)~I(year8594=="Diagnosed 85-94"), data=colon2, bhazard=colon2$rate, df=5) @ <<>>= summary(fit <- stpm2(Surv(tm,status %in% 2:3)~I(year8594=="Diagnosed 85-94"), data=colon2, bhazard=colon2$rate, df=5,cure=TRUE)) predict(fit,head(colon2),se.fit=TRUE) @ The estimate for the year parameter from the model without cure is within three significant figures with that in Stata. For the predictions, the Stata model gives: \begin{verbatim} +---------------------------------+ | surv surv_lci surv_uci | |---------------------------------| 1. | .86108264 .8542898 .8675839 | 2. | .79346526 .7850106 .8016309 | 3. | .69674037 .6863196 .7068927 | 4. | .86108264 .8542898 .8675839 | 5. | .82212425 .8143227 .8296332 | |---------------------------------| 6. | .86108264 .8542898 .8675839 | +---------------------------------+ \end{verbatim} We can estimate the proportion of failures prior to the last event time: <<>>= newdata.eof <- data.frame(year8594 = unique(colon2$year8594), tm=10) 1-predict(fit0, newdata.eof, type="surv", se.fit=TRUE) 1-predict(fit, newdata.eof, type="surv", se.fit=TRUE) predict(fit, newdata.eof, type="haz", se.fit=TRUE) @ We can plot the predicted survival estimates: \begin{center} <>= tms=seq(0,10,length=301)[-1] plot(fit0,newdata=data.frame(year8594 = "Diagnosed 85-94", tm=tms), ylim=0:1, xlab="Time since diagnosis (years)", ylab="Relative survival") plot(fit0,newdata=data.frame(year8594 = "Diagnosed 75-84",tm=tms), add=TRUE,line.col="red",rug=FALSE) ## warnings: Predicted hazards less than zero for cure plot(fit,newdata=data.frame(year8594 = "Diagnosed 85-94",tm=tms), add=TRUE,ci=FALSE,lty=2,rug=FALSE) plot(fit,newdata=data.frame(year8594="Diagnosed 75-84",tm=tms), add=TRUE,rug=FALSE,line.col="red",ci=FALSE,lty=2) legend("topright",c("85-94 without cure","75-84 without cure", "85-94 with cure","75-84 with cure"), col=c(1,2,1,2), lty=c(1,1,2,2), bty="n") @ \end{center} And the hazard curves: \begin{center} <>= plot(fit0,newdata=data.frame(year8594 = "Diagnosed 85-94", tm=tms), ylim=c(0,0.5), type="hazard", xlab="Time since diagnosis (years)",ylab="Excess hazard") plot(fit0,newdata=data.frame(year8594 = "Diagnosed 75-84", tm=tms), type="hazard", add=TRUE,line.col="red",rug=FALSE) plot(fit,newdata=data.frame(year8594 = "Diagnosed 85-94", tm=tms), type="hazard", add=TRUE,ci=FALSE,lty=2,rug=FALSE) plot(fit,newdata=data.frame(year8594="Diagnosed 75-84", tm=tms), type="hazard", add=TRUE,rug=FALSE,line.col="red",ci=FALSE,lty=2) legend("topright",c("85-94 without cure","75-84 without cure", "85-94 with cure","75-84 with cure"), col=c(1,2,1,2), lty=c(1,1,2,2), bty="n") @ \end{center} \end{document}