\documentclass[a4paper]{article}
\usepackage{a4wide,graphicx,color}
\usepackage[authoryear,round,longnamesfirst]{natbib}
\usepackage{hyperref}

\definecolor{Red}{rgb}{0.7,0,0}
\definecolor{Blue}{rgb}{0,0,0.8}


\begin{document}

\SweaveOpts{engine=R,eps=FALSE}
%\VignetteIndexEntry{Diagnostic Checking in Regression Relationships}
%\VignetteDepends{lmtest, strucchange}
%\VignetteKeywords{diagnostic checking, structural change, autocorrelation, heteroskedasticity}
%\VignettePackage{lmtest}

<<preliminaries,echo=FALSE,results=hide>>=
library(lmtest)
options(SweaveHooks=list(twofig=function() {par(mfrow=c(1,2))},
                         twofig2=function() {par(mfrow=c(2,1))},
                         onefig=function() {par(mfrow=c(1,1))}))
@

\title{Diagnostic Checking in Regression Relationships}
\author{\hfill Achim Zeileis$^\dag$ \hfill Torsten Hothorn$^\ddag$ \hfill \hfill \\
\dag {\it \small Institut f\"ur Statistik \& Wahrscheinlichkeitstheorie,
 Technische Universit\"at Wien, Austria}\\
\ddag {\it \small Institut f\"ur Medizininformatik,
Biometrie und Epidemiologie, Universit\"at Erlangen-N\"urnberg, Germany}
}
\date{}
\maketitle

\section{Introduction}

The classical linear regression model
\begin{equation} \label{eq:model}
y_i \quad = \quad x_i^\top \beta + u_i \qquad (i = 1, \dots, n)
\end{equation}
is still one of the most popular tools for data analysis
despite (or due to) its simple structure. Although it is appropriate in
many situations, there are many pitfalls that might affect the quality
of conclusions drawn from fitted models or might even lead to
uninterpretable results. Some of these pitfalls that are considered especially
important in applied econometrics are heteroskedasticity
or serial correlation of the error terms, structural changes in the
regression coefficients, nonlinearities, functional misspecification
or omitted variables. Therefore,
a rich variety of diagnostic tests for these situations have been developed
in the econometrics community, a collection of which has been
implemented in the packages \texttt{lmtest} and \texttt{strucchange} covering
the problems mentioned above.

These diagnostic tests are not only useful in econometrics but also in many other fields
where linear regression is used, which we will demonstrate with an
application from biostatistics. As \cite{lmtest:Breiman:2001} argues it is important to assess the
goodness-of-fit of data models, in particular not only using omnibus tests
but tests designed for a certain direction of the alternative. These diagnostic
checks do not have to be seen as pure significance procedures but also as an
explorative tool to extract information about the structure of the data, especially in
connection with residual plots or other diagnostic plots. As
\cite{lmtest:Brown+Durbin+Evans:1975} argue for the recursive CUSUM test, these
procedures can ``be regarded as yardsticks for the interpretation of data
rather than leading to hard and fast decisions.'' Moreover, we will always
be able to reject the null-hypothesis provided we have enough data at hand.
The question is not whether the model is wrong (it always is!) but if the
irregularities are serious.

The package \texttt{strucchange} implements a variety of procedures related
to structural change of the regression coefficients and was already
introduced in \textsf{R} news by \cite{lmtest:Zeileis:2001} and described in more
detail in \cite{lmtest:Zeileis+Leisch+Hornik:2002}. Therefore, we will focus on
the package \texttt{lmtest} in the following. Most of the tests and the datasets contained in the package
are taken from the book of \cite{lmtest:Kraemer+Sonnberger:1986}, which originally
inspired us to write the package. Compared to the book, we implemented later
versions of some tests and modern flexible interfaces for the procedures.
Most of the tests are based on the OLS residuals of a linear model, which is
specified by a formula argument. Instead of a formula a fitted model of class
\verb/"lm"/ can also be supplied, which should work if the data are either
contained in the object or still present in the workspace---however this is
not encouraged. The full references for the tests can be found on the help pages
of the respective function.

We present applications of the tests contained in \texttt{lmtest} to two different
data sets: the first is a macroeconomic time series from the U.S.
analysed by \cite{lmtest:Stock+Watson:1996} and the second is data from a
study on measurments of fetal mandible length discussed by \cite{lmtest:Royston+Altman:1994}.

\section{U.S. macroeconomic data} \label{sec:macro}

\cite{lmtest:Stock+Watson:1996}
investigate the stability of 76 monthly macroeconomic time series
from 1959 to 1993, of which we choose
the department of commerce commodity price index time series \texttt{jocci}
to illustrate the tests for heteroskedasticity and serial correlation.
The data is treated with the same methodology as all other series considered
by \cite{lmtest:Stock+Watson:1996}: they were transformed suitably (here by
log first differences) and then an AR(6) model was fitted and analysed.
The transformed series is denoted \texttt{dy} and is depicted
together with a residual plot of the AR(6) model in Figure~\ref{macro-jocci}.

\setkeys{Gin}{width=0.8\textwidth}
\begin{figure}[htbp]
\begin{center}
<<macro-jocci,fig=TRUE,twofig=TRUE,echo=FALSE,height=4,width=8>>=
data(jocci)
plot(jocci[,"dy"], ylab = "jocci (log first differences)")
ar6.model <- dy ~ dy1 + dy2 + dy3 + dy4 + dy5 +dy6
jocci.fm <- lm(ar6.model, data = jocci)
plot(time(jocci), residuals(jocci.fm), xlab = "Time", ylab = "AR(6) residuals")
@
\caption{\label{macro-jocci} The jocci series and AR(6) residual plot}
\end{center}
\end{figure}

Not surprisingly, an autoregressive model is necessary as the series itself
contains serial correlation, which can be shown by the Durbin-Watson test
<<macro-corr1>>=
data(jocci)
dwtest(dy ~ 1, data = jocci)
@
or the Breusch-Godfrey test
which also leads to a highly significant result. In the AR(6) model
given by
<<macro-model>>=
ar6.model <- dy ~ dy1 + dy2 + dy3 + dy4 + dy5 +dy6
@
where the variables on the right hand side denote the lagged variables,
there is no remaining serial correlation in the residuals:
<<macro-corr2>>=
bgtest(ar6.model, data = jocci)
@
The Durbin-Watson test is biased in dynamic models and should therefore not be applied.


The residual plot suggests that the variance of the error component increases
over time, which is emphasized by all three tests for heteroskedasticity implemented
in \texttt{lmtest}: the Breusch-Pagan test fits a linear regression model
to the residuals and rejects if too much of the variance is explained by the
auxiliary explanatory variables, which are here
the squared lagged values:
<<macro-hetsked1>>=
var.model <- ~ I(dy1^2) + I(dy2^2) + I(dy3^2) + I(dy4^2) + I(dy5^2) + I(dy6^2)
bptest(ar6.model, var.model, data = jocci)
@
The Goldfeld-Quandt test \verb/gqtest()/ and the Harrison-McCabe test \verb/hmctest()/
also give highly significant $p$ values.
Whereas the Breusch-Pagan test and the Harrison-McCabe test do not assume a
particular timing of the change of variance, the Goldfeld-Quandt test suffers
from the same problem as the Chow test for a change of the
regression coefficients: the breakpoint has to be known in advance. By default
it is taken to be after 50\% of the observations, which leads to a significant
result for the present series.

\section{The mandible data} \label{sec:mandible}

\cite{lmtest:Royston+Altman:1994}
discuss a linear regression model for data taken from
a study of fetal mandible length by \cite{lmtest:Chitty+Campbell+Altman:1993}.
The data comprises measurements of mandible \texttt{length} (in mm)
and gestational \texttt{age} (in weeks) in 158 fetuses.
The data (after log transformation) is depicted in Figure~\ref{mandible-data}
together with the fitted values of a linear model \verb/length ~ age/
and a quadratic model \verb/length ~ age + I(age^2)/.

%% just to remember: the Royston & Altman model is
%% fm.ra <- lm(log(length) ~ I(1/age), data = mandible)

\setkeys{Gin}{width=0.6\textwidth}
\begin{figure}[htbp]
\begin{center}
<<mandible-data,fig=TRUE,onefig=TRUE,echo=FALSE,height=4,width=6>>=
data(Mandible)
mandible <- log(Mandible)
attach(mandible)
plot(mandible)
fm <- lm(length ~ age)
fm2 <- lm(length ~ age + I(age^2))
lines(age, fitted(fm), col = 2)
lines(age, fitted(fm2), col = 4)
@
\caption{\label{mandible-data} The mandible data}
\end{center}
\end{figure}

Although by merely visually inspecting the raw data
or the residual plots in Figure~\ref{mandible-res} a quadratic model
seems to be more appropriate, we will first fit a linear model for illustrating
some tests for nonlinearity and misspecified functional form.

\setkeys{Gin}{width=0.8\textwidth}
\begin{figure}[htbp]
\begin{center}
<<mandible-res,fig=TRUE,twofig=TRUE,echo=FALSE,height=4,width=8>>=
plot(age, residuals(fm), ylab = "residuals (linear model)")
plot(age, residuals(fm2), ylab = "residuals (quadratic model)")
detach(mandible)
@
\caption{\label{mandible-res} Residual plots for mandible models}
\end{center}
\end{figure}

The suitable tests in \texttt{lmtest} are the Harvey-Collier test,
which is essentially a $t$ test
of the recursive residuals (standardized one step prediction errors),
and the Rainbow test. Both try to detect
nonlinearities when the data is ordered with respect to a specific
variable.
<<mandible-tests1,eval=FALSE>>=
data(Mandible)
mandible <- log(Mandible)
harvtest(length ~ age, order.by = ~ age, data = mandible)
raintest(length ~ age, order.by = ~ age, data = mandible)
@
Both lead to highly significant results, suggesting that the model
is not linear in \texttt{age}. Another appropriate procedure is
the RESET test, which tests whether some
auxiliary variables improve the fit significantly. By default
the second and third powers of the fitted values are chosen:
<<mandible-tests2>>=
resettest(length ~ age, data = mandible)
@
In our situation it would also be natural to consider powers of
the regressor \texttt{age} as auxiliary variables
<<mandible-tests3>>=
resettest(length ~ age, power = 2, type = "regressor", data = mandible)
@
which also gives a highly significant $p$ value (higher powers do not
have a significant influence). These results correspond to the better
fit of the quadratic model which can both be seen in Figure~\ref{mandible-data}
and \ref{mandible-res}. Although its residual plot does not look too
suspicious several tests are able to reveal irregularities in this model
as well. The Breusch-Pagan tests gives a $p$ value of
\Sexpr{round(bptest(length ~ age + I(age^2), data = mandible)$p.value, digits = 3)}
and the Rainbow test gives
<<mandible-tests4>>=
raintest(length ~ age + I(age^2), order.by = ~ age, data = mandible)
@
<<mandible-supF,echo=FALSE,results=hide>>=
if(require(strucchange)) {
  supF.pval <- round(sctest(length ~ age + I(age^2), data = mandible, to = 0.9, type = "supF")$p.value, digits = 3)
} else {
#  warning("`strucchange' not available: p value set to NA")
  supF.pval <- NA
}
@
and finally an sup$F$ test from the \texttt{strucchange} package would also reject the
null hypothesis of stability at 10\% level ($p = \Sexpr{supF.pval}$)
in favour of a breakpoint after about 90\% of the
observations. All three
tests probably reflect that there is more variability in the edges (especially the
right one) than in the middle which the model does not describe sufficiently.

\section{Conclusions} \label{sec:conclusions}

We illustrated the usefulness of a collection of diagnostic tests for various
situations of deviations from the assumptions of the classical linear regression
model. We chose two fairly simple data sets---an econometric and a biometric
application---to demonstrate how the tests work, but they are also particularly
helpful to detect irregularities in regressions with a larger number of regressors.

\bibliography{lmtest}
\bibliographystyle{abbrvnat}

\end{document}