https://github.com/cran/caret
Tip revision: 0079ccba7054a0e7d596488a6c9ab2bb924e2531 authored by Max Kuhn on 26 May 2018, 21:01:28 UTC
version 6.0-80
version 6.0-80
Tip revision: 0079ccb
dummyVars.Rd
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dummyVar.R
\name{dummyVars}
\alias{dummyVars}
\alias{dummyVars.default}
\alias{predict.dummyVars}
\alias{contr.dummy}
\alias{contr.ltfr}
\alias{class2ind}
\alias{dummyVars.default}
\alias{print.dummyVars}
\alias{predict.dummyVars}
\alias{contr.ltfr}
\alias{class2ind}
\title{Create A Full Set of Dummy Variables}
\usage{
dummyVars(formula, ...)
\method{dummyVars}{default}(formula, data, sep = ".", levelsOnly = FALSE,
fullRank = FALSE, ...)
\method{print}{dummyVars}(x, ...)
\method{predict}{dummyVars}(object, newdata, na.action = na.pass, ...)
contr.ltfr(n, contrasts = TRUE, sparse = FALSE)
class2ind(x, drop2nd = FALSE)
}
\arguments{
\item{formula}{An appropriate R model formula, see References}
\item{...}{additional arguments to be passed to other methods}
\item{data}{A data frame with the predictors of interest}
\item{sep}{An optional separator between factor variable names and their
levels. Use \code{sep = NULL} for no separator (i.e. normal behavior of
\code{\link[stats]{model.matrix}} as shown in the Details section)}
\item{levelsOnly}{A logical; \code{TRUE} means to completely remove the
variable names from the column names}
\item{fullRank}{A logical; should a full rank or less than full rank
parameterization be used? If \code{TRUE}, factors are encoded to be
consistent with \code{\link[stats]{model.matrix}} and the resulting there
are no linear dependencies induced between the columns.}
\item{x}{A factor vector.}
\item{object}{An object of class \code{dummyVars}}
\item{newdata}{A data frame with the required columns}
\item{na.action}{A function determining what should be done with missing
values in \code{newdata}. The default is to predict \code{NA}.}
\item{n}{A vector of levels for a factor, or the number of levels.}
\item{contrasts}{A logical indicating whether contrasts should be computed.}
\item{sparse}{A logical indicating if the result should be sparse.}
\item{drop2nd}{A logical: if the factor has two levels, should a single binary vector be returned?}
}
\value{
The output of \code{dummyVars} is a list of class 'dummyVars' with
elements \item{call }{the function call} \item{form }{the model formula}
\item{vars }{names of all the variables in the model} \item{facVars }{names
of all the factor variables in the model} \item{lvls }{levels of any factor
variables} \item{sep }{\code{NULL} or a character separator} \item{terms
}{the \code{\link[stats]{terms.formula}} object} \item{levelsOnly }{a
logical}
The \code{predict} function produces a data frame.
\code{class2ind} returns a matrix (or a vector if \code{drop2nd = TRUE}).
\code{contr.ltfr} generates a design matrix.
}
\description{
\code{dummyVars} creates a full set of dummy variables (i.e. less than full
rank parameterization)
}
\details{
Most of the \code{\link[stats]{contrasts}} functions in R produce full rank
parameterizations of the predictor data. For example,
\code{\link[stats]{contr.treatment}} creates a reference cell in the data
and defines dummy variables for all factor levels except those in the
reference cell. For example, if a factor with 5 levels is used in a model
formula alone, \code{\link[stats]{contr.treatment}} creates columns for the
intercept and all the factor levels except the first level of the factor.
For the data in the Example section below, this would produce:
\preformatted{ (Intercept) dayTue dayWed dayThu dayFri daySat daySun
1 0 0 0 0 0 0
1 0 0 0 0 0 0
1 0 0 0 0 0 0
1 0 1 0 0 0 0
1 0 1 0 0 0 0
1 0 0 0 1 0 0
1 0 0 0 0 1 0
1 0 0 0 0 1 0
1 0 0 0 1 0 0}
In some situations, there may be a need for dummy variables for all the
levels of the factor. For the same example:
\preformatted{ dayMon dayTue dayWed dayThu dayFri daySat daySun
1 0 0 0 0 0 0
1 0 0 0 0 0 0
1 0 0 0 0 0 0
0 0 1 0 0 0 0
0 0 1 0 0 0 0
0 0 0 0 1 0 0
0 0 0 0 0 1 0
0 0 0 0 0 1 0
0 0 0 0 1 0 0}
Given a formula and initial data set, the class \code{dummyVars} gathers all
the information needed to produce a full set of dummy variables for any data
set. It uses \code{contr.ltfr} as the base function to do this.
\code{class2ind} is most useful for converting a factor outcome vector to a
matrix (or vector) of dummy variables.
}
\examples{
when <- data.frame(time = c("afternoon", "night", "afternoon",
"morning", "morning", "morning",
"morning", "afternoon", "afternoon"),
day = c("Mon", "Mon", "Mon",
"Wed", "Wed", "Fri",
"Sat", "Sat", "Fri"))
levels(when$time) <- list(morning="morning",
afternoon="afternoon",
night="night")
levels(when$day) <- list(Mon="Mon", Tue="Tue", Wed="Wed", Thu="Thu",
Fri="Fri", Sat="Sat", Sun="Sun")
## Default behavior:
model.matrix(~day, when)
mainEffects <- dummyVars(~ day + time, data = when)
mainEffects
predict(mainEffects, when[1:3,])
when2 <- when
when2[1, 1] <- NA
predict(mainEffects, when2[1:3,])
predict(mainEffects, when2[1:3,], na.action = na.omit)
interactionModel <- dummyVars(~ day + time + day:time,
data = when,
sep = ".")
predict(interactionModel, when[1:3,])
noNames <- dummyVars(~ day + time + day:time,
data = when,
levelsOnly = TRUE)
predict(noNames, when)
head(class2ind(iris$Species))
two_levels <- factor(rep(letters[1:2], each = 5))
class2ind(two_levels)
class2ind(two_levels, drop2nd = TRUE)
}
\references{
\url{https://cran.r-project.org/doc/manuals/R-intro.html#Formulae-for-statistical-models}
}
\seealso{
\code{\link[stats]{model.matrix}}, \code{\link[stats]{contrasts}},
\code{\link[stats]{formula}}
}
\author{
\code{contr.ltfr} is a small modification of
\code{\link[stats]{contr.treatment}} by Max Kuhn
}
\keyword{models}