\documentclass[a4paper]{article} \usepackage{a4wide,graphicx,color} \usepackage[authoryear,round,longnamesfirst]{natbib} \usepackage{hyperref} \definecolor{Red}{rgb}{0.7,0,0} \definecolor{Blue}{rgb}{0,0,0.8} \begin{document} \SweaveOpts{engine=R,eps=FALSE} %\VignetteIndexEntry{Diagnostic Checking in Regression Relationships} %\VignetteDepends{lmtest, strucchange} %\VignetteKeywords{diagnostic checking, structural change, autocorrelation, heteroskedasticity} %\VignettePackage{lmtest} <>= library(lmtest) options(SweaveHooks=list(twofig=function() {par(mfrow=c(1,2))}, twofig2=function() {par(mfrow=c(2,1))}, onefig=function() {par(mfrow=c(1,1))})) @ \title{Diagnostic Checking in Regression Relationships} \author{\hfill Achim Zeileis$^\dag$ \hfill Torsten Hothorn$^\ddag$ \hfill \hfill \\ \dag {\it \small Institut f\"ur Statistik \& Wahrscheinlichkeitstheorie, Technische Universit\"at Wien, Austria}\\ \ddag {\it \small Institut f\"ur Medizininformatik, Biometrie und Epidemiologie, Universit\"at Erlangen-N\"urnberg, Germany} } \date{} \maketitle \section{Introduction} The classical linear regression model \begin{equation} \label{eq:model} y_i \quad = \quad x_i^\top \beta + u_i \qquad (i = 1, \dots, n) \end{equation} is still one of the most popular tools for data analysis despite (or due to) its simple structure. Although it is appropriate in many situations, there are many pitfalls that might affect the quality of conclusions drawn from fitted models or might even lead to uninterpretable results. Some of these pitfalls that are considered especially important in applied econometrics are heteroskedasticity or serial correlation of the error terms, structural changes in the regression coefficients, nonlinearities, functional misspecification or omitted variables. Therefore, a rich variety of diagnostic tests for these situations have been developed in the econometrics community, a collection of which has been implemented in the packages \texttt{lmtest} and \texttt{strucchange} covering the problems mentioned above. These diagnostic tests are not only useful in econometrics but also in many other fields where linear regression is used, which we will demonstrate with an application from biostatistics. As \cite{lmtest:Breiman:2001} argues it is important to assess the goodness-of-fit of data models, in particular not only using omnibus tests but tests designed for a certain direction of the alternative. These diagnostic checks do not have to be seen as pure significance procedures but also as an explorative tool to extract information about the structure of the data, especially in connection with residual plots or other diagnostic plots. As \cite{lmtest:Brown+Durbin+Evans:1975} argue for the recursive CUSUM test, these procedures can ``be regarded as yardsticks for the interpretation of data rather than leading to hard and fast decisions.'' Moreover, we will always be able to reject the null-hypothesis provided we have enough data at hand. The question is not whether the model is wrong (it always is!) but if the irregularities are serious. The package \texttt{strucchange} implements a variety of procedures related to structural change of the regression coefficients and was already introduced in \textsf{R} news by \cite{lmtest:Zeileis:2001} and described in more detail in \cite{lmtest:Zeileis+Leisch+Hornik:2002}. Therefore, we will focus on the package \texttt{lmtest} in the following. Most of the tests and the datasets contained in the package are taken from the book of \cite{lmtest:Kraemer+Sonnberger:1986}, which originally inspired us to write the package. Compared to the book, we implemented later versions of some tests and modern flexible interfaces for the procedures. Most of the tests are based on the OLS residuals of a linear model, which is specified by a formula argument. Instead of a formula a fitted model of class \verb/"lm"/ can also be supplied, which should work if the data are either contained in the object or still present in the workspace---however this is not encouraged. The full references for the tests can be found on the help pages of the respective function. We present applications of the tests contained in \texttt{lmtest} to two different data sets: the first is a macroeconomic time series from the U.S. analysed by \cite{lmtest:Stock+Watson:1996} and the second is data from a study on measurments of fetal mandible length discussed by \cite{lmtest:Royston+Altman:1994}. \section{U.S. macroeconomic data} \label{sec:macro} \cite{lmtest:Stock+Watson:1996} investigate the stability of 76 monthly macroeconomic time series from 1959 to 1993, of which we choose the department of commerce commodity price index time series \texttt{jocci} to illustrate the tests for heteroskedasticity and serial correlation. The data is treated with the same methodology as all other series considered by \cite{lmtest:Stock+Watson:1996}: they were transformed suitably (here by log first differences) and then an AR(6) model was fitted and analysed. The transformed series is denoted \texttt{dy} and is depicted together with a residual plot of the AR(6) model in Figure~\ref{macro-jocci}. \setkeys{Gin}{width=0.8\textwidth} \begin{figure}[htbp] \begin{center} <>= data(jocci) plot(jocci[,"dy"], ylab = "jocci (log first differences)") ar6.model <- dy ~ dy1 + dy2 + dy3 + dy4 + dy5 +dy6 jocci.fm <- lm(ar6.model, data = jocci) plot(time(jocci), residuals(jocci.fm), xlab = "Time", ylab = "AR(6) residuals") @ \caption{\label{macro-jocci} The jocci series and AR(6) residual plot} \end{center} \end{figure} Not surprisingly, an autoregressive model is necessary as the series itself contains serial correlation, which can be shown by the Durbin-Watson test <>= data(jocci) dwtest(dy ~ 1, data = jocci) @ or the Breusch-Godfrey test which also leads to a highly significant result. In the AR(6) model given by <>= ar6.model <- dy ~ dy1 + dy2 + dy3 + dy4 + dy5 +dy6 @ where the variables on the right hand side denote the lagged variables, there is no remaining serial correlation in the residuals: <>= bgtest(ar6.model, data = jocci) @ The Durbin-Watson test is biased in dynamic models and should therefore not be applied. The residual plot suggests that the variance of the error component increases over time, which is emphasized by all three tests for heteroskedasticity implemented in \texttt{lmtest}: the Breusch-Pagan test fits a linear regression model to the residuals and rejects if too much of the variance is explained by the auxiliary explanatory variables, which are here the squared lagged values: <>= var.model <- ~ I(dy1^2) + I(dy2^2) + I(dy3^2) + I(dy4^2) + I(dy5^2) + I(dy6^2) bptest(ar6.model, var.model, data = jocci) @ The Goldfeld-Quandt test \verb/gqtest()/ and the Harrison-McCabe test \verb/hmctest()/ also give highly significant $p$ values. Whereas the Breusch-Pagan test and the Harrison-McCabe test do not assume a particular timing of the change of variance, the Goldfeld-Quandt test suffers from the same problem as the Chow test for a change of the regression coefficients: the breakpoint has to be known in advance. By default it is taken to be after 50\% of the observations, which leads to a significant result for the present series. \section{The mandible data} \label{sec:mandible} \cite{lmtest:Royston+Altman:1994} discuss a linear regression model for data taken from a study of fetal mandible length by \cite{lmtest:Chitty+Campbell+Altman:1993}. The data comprises measurements of mandible \texttt{length} (in mm) and gestational \texttt{age} (in weeks) in 158 fetuses. The data (after log transformation) is depicted in Figure~\ref{mandible-data} together with the fitted values of a linear model \verb/length ~ age/ and a quadratic model \verb/length ~ age + I(age^2)/. %% just to remember: the Royston & Altman model is %% fm.ra <- lm(log(length) ~ I(1/age), data = mandible) \setkeys{Gin}{width=0.6\textwidth} \begin{figure}[htbp] \begin{center} <>= data(Mandible) mandible <- log(Mandible) attach(mandible) plot(mandible) fm <- lm(length ~ age) fm2 <- lm(length ~ age + I(age^2)) lines(age, fitted(fm), col = 2) lines(age, fitted(fm2), col = 4) @ \caption{\label{mandible-data} The mandible data} \end{center} \end{figure} Although by merely visually inspecting the raw data or the residual plots in Figure~\ref{mandible-res} a quadratic model seems to be more appropriate, we will first fit a linear model for illustrating some tests for nonlinearity and misspecified functional form. \setkeys{Gin}{width=0.8\textwidth} \begin{figure}[htbp] \begin{center} <>= plot(age, residuals(fm), ylab = "residuals (linear model)") plot(age, residuals(fm2), ylab = "residuals (quadratic model)") detach(mandible) @ \caption{\label{mandible-res} Residual plots for mandible models} \end{center} \end{figure} The suitable tests in \texttt{lmtest} are the Harvey-Collier test, which is essentially a $t$ test of the recursive residuals (standardized one step prediction errors), and the Rainbow test. Both try to detect nonlinearities when the data is ordered with respect to a specific variable. <>= data(Mandible) mandible <- log(Mandible) harvtest(length ~ age, order.by = ~ age, data = mandible) raintest(length ~ age, order.by = ~ age, data = mandible) @ Both lead to highly significant results, suggesting that the model is not linear in \texttt{age}. Another appropriate procedure is the RESET test, which tests whether some auxiliary variables improve the fit significantly. By default the second and third powers of the fitted values are chosen: <>= resettest(length ~ age, data = mandible) @ In our situation it would also be natural to consider powers of the regressor \texttt{age} as auxiliary variables <>= resettest(length ~ age, power = 2, type = "regressor", data = mandible) @ which also gives a highly significant $p$ value (higher powers do not have a significant influence). These results correspond to the better fit of the quadratic model which can both be seen in Figure~\ref{mandible-data} and \ref{mandible-res}. Although its residual plot does not look too suspicious several tests are able to reveal irregularities in this model as well. The Breusch-Pagan tests gives a $p$ value of \Sexpr{round(bptest(length ~ age + I(age^2), data = mandible)$p.value, digits = 3)} and the Rainbow test gives <>= raintest(length ~ age + I(age^2), order.by = ~ age, data = mandible) @ <>= if(require(strucchange)) { supF.pval <- round(sctest(length ~ age + I(age^2), data = mandible, to = 0.9, type = "supF")$p.value, digits = 3) } else { # warning("`strucchange' not available: p value set to NA") supF.pval <- NA } @ and finally an sup$F$ test from the \texttt{strucchange} package would also reject the null hypothesis of stability at 10\% level ($p = \Sexpr{supF.pval}$) in favour of a breakpoint after about 90\% of the observations. All three tests probably reflect that there is more variability in the edges (especially the right one) than in the middle which the model does not describe sufficiently. \section{Conclusions} \label{sec:conclusions} We illustrated the usefulness of a collection of diagnostic tests for various situations of deviations from the assumptions of the classical linear regression model. We chose two fairly simple data sets---an econometric and a biometric application---to demonstrate how the tests work, but they are also particularly helpful to detect irregularities in regressions with a larger number of regressors. \bibliography{lmtest} \bibliographystyle{abbrvnat} \end{document}