Content - a419e799dc4ec148ab3ab8519b6d13161cd39740 - 0b9beaa/man/rfeControl.Rd

visit type:
Tip revision: 185a2e309c050912d00c8fb39098d2dfb41861a4 authored by Max Kuhn on 11 December 2011, 10:41:21 UTC
version 5.09-012
Tip revision: 185a2e3
rfeControl.Rd
\name{rfeControl}
\alias{rfeControl}
\title{Controlling the Feature Selection Algorithms}
\description{
This function generates a control object that can be used to specify the details of the feature selection algorithms used in this package.
}
\usage{
rfeControl(functions = NULL,
           rerank = FALSE,
           method = "boot",
           saveDetails = FALSE,
           number = ifelse(method \%in\% c("cv", "repeatedcv"), 10, 25),
           repeats = ifelse(method \%in\% c("cv", "repeatedcv"), 1, number),
           verbose = FALSE,
           returnResamp = "all",
           p = .75,
           index = NULL,
           timingSamps = 0)
}
\arguments{
  \item{functions}{a list of functions for model fitting, prediction and variable importance (see Details below)}
  \item{rerank}{a logical: should variable importance be re-calculated each time features are removed? }
  \item{method}{The external resampling method: \code{boot}, \code{cv},
    \code{LOOCV} or  \code{LGOCV} (for repeated training/test splits}
  \item{number}{Either the number of folds or number of resampling iterations}
  \item{repeats}{For repeated k-fold cross-validation only: the number of complete sets of folds to compute}
  \item{saveDetails}{a logical to save the predictions and variable importances from the selection process}
  \item{verbose}{a logical to print a log for each external resampling iteration}
  \item{returnResamp}{A character string indicating how much of the resampled summary metrics should be saved. Values can be ``final'', ``all'' or ``none''}
  \item{p}{For leave-group out cross-validation: the training percentage}
  \item{index}{a list with elements for each external resampling iteration. Each list element is the sample rows used for training at that iteration.}
  \item{timingSamps}{the number of training set samples that will be used to measure the time for predicting samples (zero indicates that the prediction time should not be estimated).}
}
\details{
Backwards selection requires function to be specified for some operations. 

The \code{fit} function builds the model based on the current data set. The arguments for the function must be:
\itemize{
      \item{\code{x}}{ the current training set of predictor data with
            the appropriate subset of variables}
      \item{\code{y}}{ the current outcome data (either a numeric or 
            factor vector)}
      \item{\code{first}}{ a single logical value for whether the
            current predictor set has all possible variables}
      \item{\code{last}}{ similar to \code{first}, but \code{TRUE} 
            when the last model is fit with the final subset size and
            predictors.}
      \item{\code{...}}{optional arguments to pass to the fit 
            function in the call to \code{rfe}}
}
The function should return a model object that can be used to generate predictions.

The \code{pred} function returns a vector of predictions (numeric or factors) from the current model. The arguments are:
\itemize{
      \item{\code{object}}{ the model generated by the \code{fit} 
            function}
      \item{\code{x}}{ the current set of predictor set for the 
            held-back samples}
}

The \code{rank} function is used to return the predictors in the order of the most important to the least important. Inputs are:
\itemize{
      \item{\code{object}}{ the model generated by the \code{fit} 
            function}
      \item{\code{x}}{ the current set of predictor set for the 
            training samples}
      \item{\code{y}}{ the current training outcomes}
}
The function should return a data frame with a column called \code{var} that has the current variable names. The first row should be the most important predictor etc. Other columns can be included in the output and will be returned in the final \code{rfe} object.

The \code{selectSize} function determines the optimal number of predictors based on the resampling output. Inputs for the function are:
\itemize{
      \item{\code{x}}{a matrix with columns for the performance 
            metrics and the number of variables, called 
            "\code{Variables}"}
      \item{\code{metric}}{a character string of the performance 
            measure to optimize (e.g. "RMSE", "Rsquared", "Accuracy"
            or "Kappa")}
      \item{\code{maximize}}{a single logical for whether the metric
            should be maximized}
}
This function should return an integer corresponding to the optimal subset size. \pkg{caret} comes with two examples functions for this purpose: \code{\link{pickSizeBest}} and \code{\link{pickSizeTolerance}}.

After the optimal subset size is determined, the \code{selectVar} function will be used to calculate the best rankings for each variable across all the resampling iterations. Inputs for the function are:
\itemize{
      \item{\code{y}}{ a list of variables importance for each 
            resampling iteration and each subset size (generated by 
            the user--defined \code{rank} function). In the example,
            each each of the cross--validation groups the output of 
            the \code{rank} function is saved for each of the 
            subset sizes (including the original subset). If the 
            rankings are not recomputed at each iteration, the 
            values will be the same within each cross-validation 
            iteration.}
      \item{\code{size}}{ the integer returned by the  
            \code{selectSize} function}
}
This function should return a character string of predictor names (of length \code{size}) in the order of most important to least important

Examples of these functions are included in the package: \code{\link{lmFuncs}}, \code{\link{rfFuncs}}, \code{\link{treebagFuncs}} and \code{\link{nbFuncs}}.

Model details about these functions, including examples, are in the package vignette for feature selection.

}
\value{
A list
}
\author{Max Kuhn }

\seealso{ \code{\link{rfe}},  \code{\link{lmFuncs}}, \code{\link{rfFuncs}}, \code{\link{treebagFuncs}}, \code{\link{nbFuncs}}, \code{\link{pickSizeBest}}, \code{\link{pickSizeTolerance}} }


\keyword{ utilities }
Browse the archive

https://github.com/cran/caret