https://github.com/berndbischl/mlr
Raw File
Tip revision: b251a1542cf77ae04eb0b76db947f58fc7aab3a9 authored by pat-s on 22 January 2021, 10:10:04 UTC
update update-tic
Tip revision: b251a15
makeLearner.Rd
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/makeLearner.R
\name{makeLearner}
\alias{makeLearner}
\alias{Learner}
\title{Create learner object.}
\usage{
makeLearner(
  cl,
  id = cl,
  predict.type = "response",
  predict.threshold = NULL,
  fix.factors.prediction = FALSE,
  ...,
  par.vals = list(),
  config = list()
)
}
\arguments{
\item{cl}{(\code{character(1)})\cr
Class of learner. By convention, all classification learners
start with \dQuote{classif.} all regression learners with
\dQuote{regr.} all survival learners start with \dQuote{surv.}
all clustering learners with \dQuote{cluster.} and all multilabel
classification learners start with \dQuote{multilabel.}.
A list of all integrated learners is available on the
\link{learners} help page.}

\item{id}{(\code{character(1)})\cr Id string for object. Used to display object.
Default is \code{cl}.}

\item{predict.type}{(\code{character(1)})\cr Classification: \dQuote{response} (=
labels) or \dQuote{prob} (= probabilities and labels by selecting the ones
with maximal probability). Regression: \dQuote{response} (= mean response)
or \dQuote{se} (= standard errors and mean response). Survival:
\dQuote{response} (= some sort of orderable risk) or \dQuote{prob} (= time
dependent probabilities). Clustering: \dQuote{response} (= cluster IDS) or
\dQuote{prob} (= fuzzy cluster membership probabilities), Multilabel:
\dQuote{response} (= logical matrix indicating the predicted class labels)
or \dQuote{prob} (= probabilities and corresponding logical matrix
indicating class labels). Default is \dQuote{response}.}

\item{predict.threshold}{(\link{numeric})\cr
Threshold to produce class labels. Has to be a named vector, where names correspond to class labels.
Only for binary classification it can be a single numerical threshold for the positive class.
See \link{setThreshold} for details on how it is applied.
Default is \code{NULL} which means 0.5 / an equal threshold for each class.}

\item{fix.factors.prediction}{(\code{logical(1)})\cr In some cases, problems occur
in underlying learners for factor features during prediction. If the new
features have LESS factor levels than during training (a strict subset),
the learner might produce an  error like \dQuote{type of predictors in new
data do not match that of the training data}. In this case one can repair
this problem by setting this option to \code{TRUE}. We will simply add the
missing factor levels missing from the test feature (but present in
training) to that feature. Default is \code{FALSE}.}

\item{...}{(any)\cr Optional named (hyper)parameters. If you want to set
specific hyperparameters for a learner during model creation, these should
go here. You can get a list of available hyperparameters using
\verb{getParamSet(<learner>)}. Alternatively hyperparameters can be given using
the \code{par.vals} argument but \code{...} should be preferred!}

\item{par.vals}{(\link{list})\cr Optional list of named (hyper)parameters. The
arguments in \code{...} take precedence over values in this list. We strongly
encourage you to use \code{...} for passing hyperparameters.}

\item{config}{(named \link{list})\cr Named list of config option to overwrite
global settings set via \link{configureMlr} for this specific learner.}
}
\value{
(\link{Learner}).
}
\description{
For a classification learner the \code{predict.type} can be set to
\dQuote{prob} to predict probabilities and the maximum value selects the
label. The threshold used to assign the label can later be changed using the
\link{setThreshold} function.

To see all possible properties of a learner, go to: \link{LearnerProperties}.
}
\section{\code{par.vals} vs. \code{...}}{


The former aims at specifying default hyperparameter settings from \code{mlr}
which differ from the actual defaults in the underlying learner. For
example, \code{respect.unordered.factors} is set to \code{order} in \code{mlr} while the
default in \link[ranger:ranger]{ranger::ranger} depends on the argument \code{splitrule}.
\verb{getHyperPars(<learner>)} can be used to query hyperparameter defaults that
differ from the underlying learner. This function also shows all
hyperparameters set by the user during learner creation (if these differ
from the learner defaults).
}

\section{regr.randomForest}{


For this learner we added additional uncertainty estimation functionality
(\code{predict.type = "se"}) for the randomForest, which is not provided by the
underlying package.

Currently implemented methods are:

\itemize{
\item If \code{se.method = "jackknife"} the standard error of a prediction is
estimated by computing the jackknife-after-bootstrap, the mean-squared
difference between the prediction made by only using trees which did not
contain said observation and the ensemble prediction.
\item If \code{se.method = "bootstrap"} the standard error of a prediction is
estimated by bootstrapping the random forest, where the number of bootstrap
replicates and the number of trees in the ensemble are controlled by
\code{se.boot} and \code{se.ntree} respectively, and then taking the standard deviation
of the bootstrap predictions. The "brute force" bootstrap is executed when
\code{ntree = se.ntree}, the latter of which controls the number of trees in the
individual random forests which are bootstrapped. The "noisy bootstrap" is
executed when \code{se.ntree < ntree} which is less computationally expensive. A
Monte-Carlo bias correction may make the latter option prefarable in many
cases. Defaults are \code{se.boot = 50} and \code{se.ntree = 100}.

\item If \code{se.method = "sd"}, the default, the standard deviation of the
predictions across trees is returned as the variance estimate. This can be
computed quickly but is also a very naive estimator. }

For both \dQuote{jackknife} and \dQuote{bootstrap}, a Monte-Carlo bias
correction is applied and, in the case that this results in a negative
variance estimate, the values are truncated at 0.

Note that when using the \dQuote{jackknife} procedure for se estimation,
using a small number of trees can lead to training data observations that are
never out-of-bag. The current implementation ignores these observations, but
in the original definition, the resulting se estimation would be undefined.

Please note that all of the mentioned \code{se.method} variants do not affect the
computation of the posterior mean \dQuote{response} value. This is always the
same as from the underlying randomForest.
}

\section{regr.featureless}{


A very basic baseline method which is useful for model comparisons (if you
don't beat this, you very likely have a problem).
Does not consider any features of the task and only uses the target feature
of the training data to make predictions.
Using observation weights is currently not supported.

Methods \dQuote{mean} and \dQuote{median} always predict a constant value
for each new observation which corresponds to the observed mean or median of
the target feature in training data, respectively.

The default method is \dQuote{mean} which corresponds to the ZeroR algorithm
from WEKA, see \url{https://weka.wikispaces.com/ZeroR}.
}

\section{classif.featureless}{


Method \dQuote{majority} predicts always the majority class for each new
observation. In the case of ties, one randomly sampled, constant class is predicted
for all observations in the test set.
This method is used as the default. It is very similar to the ZeroR classifier
from WEKA (see \url{https://weka.wikispaces.com/ZeroR}). The only difference is
that ZeroR always predicts the first class of the tied class values instead
of sampling them randomly.

Method \dQuote{sample-prior} always samples a random class for each individual test
observation according to the prior probabilities observed in the training data.

If you opt to predict probabilities, the class probabilities always
correspond to the prior probabilities observed in the training data.
}

\examples{
makeLearner("classif.rpart")
makeLearner("classif.lda", predict.type = "prob")
lrn = makeLearner("classif.lda", method = "t", nu = 10)
getHyperPars(lrn)
}
\seealso{
Other learner: 
\code{\link{LearnerProperties}},
\code{\link{getClassWeightParam}()},
\code{\link{getHyperPars}()},
\code{\link{getLearnerId}()},
\code{\link{getLearnerNote}()},
\code{\link{getLearnerPackages}()},
\code{\link{getLearnerParVals}()},
\code{\link{getLearnerParamSet}()},
\code{\link{getLearnerPredictType}()},
\code{\link{getLearnerShortName}()},
\code{\link{getLearnerType}()},
\code{\link{getParamSet}()},
\code{\link{helpLearnerParam}()},
\code{\link{helpLearner}()},
\code{\link{makeLearners}()},
\code{\link{removeHyperPars}()},
\code{\link{setHyperPars}()},
\code{\link{setId}()},
\code{\link{setLearnerId}()},
\code{\link{setPredictThreshold}()},
\code{\link{setPredictType}()}
}
\concept{learner}
back to top