%\VignetteIndexEntry{Introduction to the rstpm2 Package}
%\VignetteDepends{rstpm2}
%\VignetteKeyword{survival, spline}
%\VignettePackage{rstpm2}
%!\SweaveUTF8

\documentclass[nojss]{jss}

\usepackage{amsmath,amsfonts,enumitem}
\usepackage[utf8]{inputenc}

\title{Introduction to the \pkg{rstpm2} package}

\author{Mark~Clements\\Karolinska Institutet}

\Plainauthor{Mark~Clements}

\Plaintitle{Introduction to the rstpm2 package}

\Abstract{

This vignette outlines the methods and provides some examples for
link-based survival models as implemented in the \proglang{R}
\pkg{rstpm2} package.

}

\Keywords{survival, splines}

\Plainkeywords{survival, splines}

Department of Medical Epidemiology and Biostatistics\\
Karolinska Institutet\\
Email: \email{mark.clements@ki.se}
}

\begin{document}

\section{Background and theory}

\emph{Link-based survival models} provide a flexible and general approach to modelling survival or time-to-event data. The survival function $S(t|x)$ to time $t$ for covariates $x$ is defined in terms of a link function $G$ and a linear prediction $\eta(t,x)$, such that

\begin{align*}
S(t|x) &= G(\eta(t,x))
\end{align*}

\noindent where $\eta$ is a function of both time $t$ and covariates $x$. The linear predictor can be constructed in a flexible manner. Royston and Parmar (2003) focused on time being modelled using natural splines for log-time, including left truncation and relative survival. We have implemented the Royston-Parmar model class and extended it in several ways, allowing for: (i) general parametric models for $\eta(t,x)$, including B-splines and natural splines for different transformations of time; (ii) general semi-parametric models for $\eta(t,x)$ including penalised smoothers together with unpenalised parametric functions; (iii) interval censoring; and (iv) frailties using Gamma and log-Normal distributions. Fully parametric models are estimated using maximum likelihood, while the semi-parametric models are estimated using maximum penalised likelihood with smoothing parameters selected using
A more detailed theoretical development is available from the paper by Liu, Pawitan and Clements (available on request).

Why would you want to use these models?

\section{Mean survival}

This has a useful interpretation for causal inference.

$E_Z(S(t|Z,X=1))-E_Z(S(t|Z,X=0))$

\begin{verbatim}
fit <- stpm(...)
predict(fit,type="meansurv",newdata=data)
\end{verbatim}

\section{Cure models}

For cure, we use the melanoma dataset used by Andersson and colleagues
for cure models with Stata's stpm2 (see
\url{http://www.pauldickman.com/survival/}).

Initially, we merge the patient data with the all cause mortality rates.

<<echo=FALSE,results=hide>>=
options(width=80,useFancyQuotes="UTF-8")
require(rstpm2)
@
<<>>=
popmort2 <- transform(rstpm2::popmort,exitage=age,exityear=year,age=NULL,year=NULL)
colon2 <- within(rstpm2::colon, {
status <- ifelse(surv_mm>120.5,1,status)
tm <- pmin(surv_mm,120.5)/12
exit <- dx+tm*365.25
sex <- as.numeric(sex)
exitage <- pmin(floor(age+tm),99)
exityear <- floor(yydx+tm)
##year8594 <- (year8594=="Diagnosed 85-94")
})
colon2 <- merge(colon2,popmort2)
@
For comparisons, we fit the relative survival model without and with cure.
%% <<results=hide>>=
<<>>=
fit0 <- stpm2(Surv(tm,status %in% 2:3)~I(year8594=="Diagnosed 85-94"),
data=colon2,
bhazard=colon2$rate, df=5) @ <<>>= summary(fit <- stpm2(Surv(tm,status %in% 2:3)~I(year8594=="Diagnosed 85-94"), data=colon2, bhazard=colon2$rate,
df=5,cure=TRUE))
@
The estimate for the year parameter from the model without cure is within three significant
figures with that in Stata. For the predictions, the Stata model gives:
\begin{verbatim}
+---------------------------------+
|      surv   surv_lci   surv_uci |
|---------------------------------|
1. | .86108264   .8542898   .8675839 |
2. | .79346526   .7850106   .8016309 |
3. | .69674037   .6863196   .7068927 |
4. | .86108264   .8542898   .8675839 |
5. | .82212425   .8143227   .8296332 |
|---------------------------------|
6. | .86108264   .8542898   .8675839 |
+---------------------------------+
\end{verbatim}
We can estimate the proportion of failures prior to the last event time:
<<>>=
newdata.eof <- data.frame(year8594 = unique(colon2\$year8594),
tm=10)
1-predict(fit0, newdata.eof, type="surv", se.fit=TRUE)
1-predict(fit, newdata.eof, type="surv", se.fit=TRUE)
predict(fit, newdata.eof, type="haz", se.fit=TRUE)
@
We can plot the predicted survival estimates:
\begin{center}
<<fig=TRUE>>=
tms=seq(0,10,length=301)[-1]
plot(fit0,newdata=data.frame(year8594 = "Diagnosed 85-94", tm=tms), ylim=0:1,
xlab="Time since diagnosis (years)", ylab="Relative survival")
plot(fit0,newdata=data.frame(year8594 = "Diagnosed 75-84",tm=tms),
## warnings: Predicted hazards less than zero for cure
plot(fit,newdata=data.frame(year8594 = "Diagnosed 85-94",tm=tms),
plot(fit,newdata=data.frame(year8594="Diagnosed 75-84",tm=tms),
legend("topright",c("85-94 without cure","75-84 without cure",
"85-94 with cure","75-84 with cure"),
col=c(1,2,1,2), lty=c(1,1,2,2), bty="n")
@
\end{center}

And the hazard curves:

\begin{center}
<<fig=TRUE>>=
plot(fit0,newdata=data.frame(year8594 = "Diagnosed 85-94", tm=tms),
ylim=c(0,0.5), type="hazard",
xlab="Time since diagnosis (years)",ylab="Excess hazard")
plot(fit0,newdata=data.frame(year8594 = "Diagnosed 75-84", tm=tms),
type="hazard",
plot(fit,newdata=data.frame(year8594 = "Diagnosed 85-94", tm=tms),
type="hazard",