Content - f82b6351a43251568a76e60f71643b8a98eec0f4 - b95227a/integrated_learners.Rmd

integrated_learners.Rmd
---
title: "Integrated Learners"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{mlr}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r, echo = FALSE, message=FALSE}
library("mlr")
library("BBmisc")
library("ParamHelpers")
urlContribPackages = urlBasePackages = "http://www.rdocumentation.org/packages/"

## show grouped code output instead of single lines
knitr::opts_chunk$set(collapse = TRUE)
```

This page lists the learning methods already integrated in `mlr`.

Columns **Num.**, **Fac.**, **Ord.**, **NAs**, and **Weights** indicate if a method can cope with numerical, factor, and ordered factor predictors, if it can deal with missing values in a meaningful way (other than simply removing observations with missing values) and if observation weights are supported.

Column **Props** shows further properties of the learning methods specific to the type of learning task.
See also `RLearner()` for details.

```{r include=FALSE}
library("mlr")
library("pander")
# baseR, urlBasePackages, urlContribPackages are defined in build
linkPkg = function(x) {
  x = strsplit(x, ",")[[1]]
  x = mlr:::cleanupPackageNames(x)    # remove exclamation marks
  if (urlBasePackages != urlContribPackages) {
    ind = x %in% baseR
    url = c(urlContribPackages, urlBasePackages)[ind + 1]
  } else {
    url = urlContribPackages
  }
  collapse(sprintf("[%1$s](%2$s%1$s/)", x, url), sep = "<br />")
}

getTab = function(type) {
  cn = function(x) if (is.null(x)) NA else {r = gsub("\\n", " ", x); gsub("(\\$)", "\\\\\\1", r)}
  lrns = listLearners(type)
  lrns$note[is.na(lrns$note)] = ""
  cols = c("class", "type", "package", "short.name", "name", "numerics", "factors", "ordered", "missings", "weights", "note", "installed")
  props = setdiff(colnames(lrns), cols)

  colNames = c("Class / Short Name / Name", "Packages", "Num.", "Fac.", "Ord.", "NAs", "Weights", "Props", "Note")
  df = makeDataFrame(nrow = nrow(lrns), ncol = length(colNames),
    col.types = c("character", "character", "logical", "logical", "logical", "logical", "logical", "character", "character"))
  names(df) = colNames

  df[1] = apply(lrns[c("class", "short.name", "name")], 1, function(x)
    paste0("**", x["class"], "** <br /> *", cn(x["short.name"]), "* <br /><br />", cn(x["name"])))
  df$Packages = sapply(lrns$package, linkPkg)
  df[3:7] = lrns[c("numerics", "factors", "ordered", "missings", "weights")]
  df$Props = apply(lrns[props], 1, function(x) collapse(props[x], sep = "<br />"))
  df$Note = sapply(lrns$note, cn)
  logicals = vlapply(df, is.logical)
  df[logicals] = lapply(df[logicals], function(x) ifelse(x, "X", ""))
  df
}

makeTab = function(df) {
  pandoc.table(df, style = "rmarkdown", split.tables = Inf, split.cells = Inf,
    justify = c("left", "left", "center", "center", "center", "center", "center", "left", "left"))
}

types = c("classif", "multilabel", "regr", "surv", "cluster")
tables = lapply(types, getTab)
names(tables) = types
numbers = sapply(tables, nrow)
```

# Classification (`r numbers["classif"]`)

For classification the following additional learner properties are relevant and shown in column **Props**:

* *prob*: The method can predict probabilities,
* *oneclass*, *twoclass*, *multiclass*: One-class, two-class (binary) or multi-class classification problems be handled,
* *class.weights*: Class weights can be handled.

```{r echo=FALSE,results="asis"}
makeTab(tables[["classif"]])
```

# Regression (`r numbers["regr"]`)

Additional learner properties:

* *se*: Standard errors can be predicted.

```{r echo=FALSE,results="asis"}
makeTab(tables[["regr"]])
```

# Survival analysis (`r numbers["surv"]`)

Additional learner properties:

* *prob*: Probabilities can be predicted,
* *rcens*, *lcens*, *icens*: The learner can handle right, left and/or interval censored data.

```{r echo=FALSE,results="asis"}
makeTab(tables[["surv"]])
```

# Cluster analysis (`r numbers["cluster"]`)

Additional learner properties:

* *prob*: Probabilities can be predicted.

```{r echo=FALSE,results="asis"}
makeTab(tables[["cluster"]])
```

# Cost-sensitive classification

For *ordinary misclassification costs* you can use all the standard classification methods listed above.

For *example-dependent costs* there are several ways to generate cost-sensitive learners from ordinary regression and classification learners.
See section [cost-sensitive classification](cost_sensitive_classif.html){target="_blank"} and the documentation of `makeCostSensClassifWrapper()`, `makeCostSensRegrWrapper()` and `makeCostSensWeightedPairsWrapper()` for details.

# Multilabel classification (`r numbers["multilabel"]`)

```{r echo=FALSE,results="asis"}
makeTab(tables[["multilabel"]])
```

Moreover, you can use the binary relevance method to apply ordinary classification learners to the multilabel problem. 
See the documentation of function `makeMultilabelBinaryRelevanceWrapper()` and the tutorial section on [multilabel classification](multilabel.html){target="_blank"} for details.