https://github.com/cran/gss
Tip revision: 9f0152d0fb61ff50420926206dd516f6589e7a23 authored by Chong Gu on 08 August 1977, 00:00:00 UTC
version 0.8-3
version 0.8-3
Tip revision: 9f0152d
ssanova.Rd
\name{ssanova}
\alias{ssanova}
\title{Fitting Smoothing Spline ANOVA Models}
\description{
Fit smoothing spline ANOVA models with cubic spline, linear spline,
or thin-plate spline marginals for numerical variables. Factors are
also accepted. The symbolic model specification via \code{formula}
follows the same rules as in \code{\link{lm}}.
}
\usage{
ssanova(formula, type="cubic", data=list(), weights, subset,
offset, na.action=na.omit, partial=NULL, method="v",
varht=1, prec=1e-7, maxiter=30, ext=.05, order=2)
}
\arguments{
\item{formula}{Symbolic description of the model to be fit.}
\item{type}{Type of numerical marginals to be used. Supported are
\code{type="cubic"} for cubic spline marginals,
\code{type="linear"} for linear spline marginals, and
\code{type="tp"} for thin-plate spline marginals.}
\item{data}{Optional data frame containing the variables in the
model.}
\item{weights}{Optional vector of weights to be used in the
fitting process.}
\item{subset}{Optional vector specifying a subset of observations
to be used in the fitting process.}
\item{offset}{Optional offset term with known parameter 1.}
\item{na.action}{Function which indicates what should happen when
the data contain NAs.}
\item{partial}{Optional extra fixed effect terms in partial spline
models.}
\item{method}{Method for smoothing parameter selection. Supported
are \code{method="v"} for GCV, \code{method="m"} for GML (REML),
and \code{method="u"} for Mallow's CL.}
\item{varht}{External variance estimate needed for
\code{method="u"}. Ignored when \code{method="v"} or
\code{method="m"} are specified.}
\item{prec}{Precision requirement in the iteration for multiple
smoothing parameter selection. Ignored when only one smoothing
parameter is involved.}
\item{maxiter}{Maximum number of iterations allowed for multiple
smoothing parameter selection. Ignored when only one smoothing
parameter is involved.}
\item{ext}{For cubic spline and linear spline marginals, this option
specifies how far to extend the domain beyond the minimum and
the maximum as a percentage of the range. The default
\code{ext=.05} specifies marginal domains of lengths 110 percent
of their respective ranges. Prediction outside of the domain
will result in an error. Ignored if \code{type="tp"} is
specified.}
\item{order}{For thin-plate spline marginals, this option specifies
the order of the marginal penalties. Ignored if
\code{type="cubic"} or \code{type="linear"} are specified.}
}
\details{
\code{ssanova} and the affiliated \code{\link{methods}} provide a
front end to RKPACK, a collection of RATFOR routines for structural
multivariate nonparametric regression via the penalized least
squares method. The algorithms implemented in RKPACK are of the
orders \eqn{O(n^3)} in execution time and \eqn{O(n^2)} in memory
requirement. The constants in front of the orders vary with the
complexity of the model to be fit.
The model specification via \code{formula} is intuitive. For
example, \code{y~x1*x2} yields a model of the form
\deqn{
y = c + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e
}
with the terms denoted by \code{"1"}, \code{"x1"}, \code{"x2"}, and
\code{"x1:x2"}. Through the specifications of the side conditions,
these terms are uniquely defined. In the current implementation,
\eqn{f_{1}} and \eqn{f_{12}} integrate to \eqn{0} on the \code{x1}
domain for cubic spline and linear spline marginals, and add to
\eqn{0} over the \code{x1} (marginal) sampling points for thin-plate
spline marginals.
The penalized least squares problem is equivalent to a certain
empirical Bayes model or a mixed effect model, and the model terms
themselves are generally sums of finer terms of two types, the
unpenalized \emph{fixed effects} and the penalized \emph{random}
\emph{effects}. Attached to every random effect there is a
smoothing parameter, and the model complexity is largely determined
by the number of smoothing parameters.
The method \code{predict} can be used to evaluate the sum of
selected or all model terms at arbitrary points within the domain,
along with standard errors derived from a certain Bayesian
calculation. The method \code{summary} has a flag to request
diagnostics for the practical identifiability and significance of
the model terms.
}
\note{
The independent variables appearing in \code{formula} can be
multivariate themselves. In particular,
\code{ssanova(y~x,"tp",order=order)} can be used to fit ordinary
thin-plate splines in any dimension, of any order permissible, and
with standard errors available for Bayesian confidence intervals.
Note that thin-plate splines reduce to polynomial splines in one
dimension.
For univariate marginals, the additive models using
\code{type="cubic"} and \code{type="tp"} yield identical fit through
different internal makes. For example,
\code{ssanova(y~x1+x2,"cubic")} and \code{ssanova(y~x1+x2,"tp")}
yield the same fit. The same is not true for models with
interactions, however.
Mathematically, the domain (through \code{ext} for
\code{type="cubic"}) or the order (through \code{order} for
\code{type="tp"}) could be specified individually for each of the
variables. Such flexibility is not provided in our implementation,
however, as it would be more a source for confusion than a practical
utility.
}
\section{Factors}{
Factors are accepted as predictors. When a factor has 3 or more
levels, all terms involving it are treated as random effects, with
the "level means" being shrunk towards each other. The shrinking is
done differently for nominal and ordinal factors; see
\code{\link{mkrk.factor}} for details.
}
\value{
\code{ssanova} returns a list object of \code{\link{class} "ssanova"}.
The method \code{\link{summary}} is used to obtain summaries of the
fits. The method \code{\link{predict}} can be used to evaluate the
fits at arbitrary points, along with the standard errors to be used
in Bayesian confidence intervals. The methods
\code{\link{residuals}} and \code{\link{fitted.values}} extract the
respective traits from the fits.
}
\seealso{
Methods \code{\link{predict.ssanova}},
\code{\link{summary.ssanova}}, and \code{\link{fitted.ssanova}}.
}
\author{Chong Gu, \email{chong@stat.purdue.edu}}
\references{
Gu, C. (2002), \emph{Smoothing Spline ANOVA Models}. New York:
Springer-Verlag.
Wahba, G. (1990), \emph{Spline Models for Observational Data}.
Philadelphia: SIAM.
}
\examples{
## Fit a cubic spline
x <- runif(100); y <- 5 + 3*sin(2*pi*x) + rnorm(x)
cubic.fit <- ssanova(y~x,method="m")
## The same fit with different internal makes
tp.fit <- ssanova(y~x,"tp",method="m")
## Obtain estimates and standard errors on a grid
new <- data.frame(x=seq(min(x),max(x),len=50))
est <- predict(cubic.fit,new,se=TRUE)
## Plot the fit and the Bayesian confidence intervals
plot(x,y,col=1); lines(new$x,est$fit,col=2)
lines(new$x,est$fit+1.96*est$se,col=3)
lines(new$x,est$fit-1.96*est$se,col=3)
## Clean up
\dontrun{rm(x,y,cubic.fit,tp.fit,new,est)
dev.off()}
## Fit a tensor product cubic spline
data(nox)
nox.fit <- ssanova(log10(nox)~comp*equi,data=nox)
## Fit a spline with cubic and nominal marginals
nox$comp<-as.factor(nox$comp)
nox.fit.n <- ssanova(log10(nox)~comp*equi,data=nox)
## Fit a spline with cubic and ordinal marginals
nox$comp<-as.ordered(nox$comp)
nox.fit.o <- ssanova(log10(nox)~comp*equi,data=nox)
## Clean up
\dontrun{rm(nox,nox.fit,nox.fit.n,nox.fit.o)}
}
\keyword{smooth}
\keyword{models}
\keyword{regression}