https://github.com/cran/caret
Raw File
Tip revision: 4c1f6847dba8a57d069f6d933823ae47316c40d8 authored by Max Kuhn on 15 February 2014, 00:00:00 UTC
version 6.0-24
Tip revision: 4c1f684
classDist.Rd
\name{classDist}
\Rdversion{1.1}
\alias{classDist.default}
\alias{classDist}
\alias{predict.classDist}
\title{
Compute and predict the distances to class centroids
}
\description{
This function computes the class centroids and covariance matrix for a training set for determining Mahalanobis distances of samples to each class centroid.
}
\usage{
classDist(x, ...)

\method{classDist}{default}(x, y, groups = 5, pca = FALSE, keep = NULL, ...)

\method{predict}{classDist}(object, newdata, trans = log, ...)

}
\arguments{
  \item{x}{a matrix or data frame of predictor variables}
  \item{y}{a numeric or factor vector of class labels}
  \item{groups}{an integer for the number of bins for splitting
                a numeric outcome}
  \item{pca}{a logical: should principal components analysis be 
             applied to the dataset prior to splitting the data by
             class?}
  \item{keep}{an integer for the number of PCA components that should
              by used to predict new samples (\code{NULL} uses all
              within a tolerance of \code{sqrt(.Machine$double.eps)})}
  \item{object}{an object of class \code{classDist}}
  \item{newdata}{a matrix or data frame. If \code{vars} was 
                 previously specified, these columns should be in
                 \code{newdata}}
  \item{trans}{an optional function that can be applied to each class
               distance. \code{trans = NULL} will not apply a 
               function}
  \item{\dots}{optional arguments to pass (not currently used)}
}
\details{
For factor outcomes, the data are split into groups for each class 
and the mean and covariance matrix are calculated. These are then 
used to compute Mahalanobis distances to the class centers (using 
\code{predict.classDist} The function will check for non-singular matrices.

For numeric outcomes, the data are split into roughly equal sized 
bins based on \code{groups}. Percentiles are used to split the data. 

}
\value{
for \code{classDist}, an object of class \code{classDist} with 
elements:
  \item{values }{a list with elements for each class. Each element 
                 contains a mean vector for the class centroid and the
                 inverse of the class covariance matrix}
  \item{classes}{a character vector of class labels}
  \item{pca}{the results of \code{\link[stats]{prcomp}} when 
             \code{pca = TRUE}}
  \item{call}{the function call}
  \item{p}{the number of variables}
  \item{n}{a vector of samples sizes per class}

For \code{predict.classDist}, a matrix with columns for each class. 
The columns names are the names of the class with the prefix 
\code{dist.}. In the case of numeric \code{y}, the class labels are
the percentiles. For example, of \code{groups = 9}, the variable names
would be \code{dist.11.11}, \code{dist.22.22}, etc.
}

\author{
Max Kuhn
}
\references{Forina et al. CAIMAN brothers: A family of powerful classification and class modeling techniques. Chemometrics and Intelligent Laboratory Systems (2009) vol. 96 (2) pp. 239-245}

\seealso{\code{\link[stats]{mahalanobis}}}
\examples{
trainSet <- sample(1:150, 100)

distData <- classDist(iris[trainSet, 1:4], 
                      iris$Species[trainSet])

newDist <- predict(distData,
                   iris[-trainSet, 1:4])

splom(newDist, groups = iris$Species[-trainSet])
}
\keyword{ manip }

back to top