Content - a54d387886a84a6302985951d3ff454dd3a29026 - 04ea901/vignettes/introduction.Rmd

swh:1:snp:48ea8a6d6519f1b1cb458bbc692eef758fbe018a

Tip revision: 464b4dfa2e191736562fa9158765ca8acd2b2464 authored by Takaya Saito on 13 April 2016, 09:28:24 UTC
version 0.3.2
Tip revision: 464b4df
introduction.Rmd
---
title: "Introduction to precrec"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to precrec}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

The `precrec` package provides accurate computations of ROC and 
Precision-Recall curves.

## 1. ROC and Precision-Recall calculations

The `evalmod` function calculates ROC and Precision-Recall curves and 
returns an S3 object.
```{r}
library(precrec)

# Load a test dataset
data(P10N10)

# Calculate ROC and Precision-Recall curves
sscurves <- evalmod(scores = P10N10$scores, labels = P10N10$labels)
```

### S3 generics
The `precrec` package provides five S3 generics for the S3 object 
created by the `evalmod` function.

S3 generic  Package   Description 
----------  --------  ------------------------------------------------------------------
     print  base      Print the calculation results and the summary of the test data
      plot  graphics  Plot performance evaluation measures
  autoplot  ggplot2   Plot performance evaluation measures with ggplot2
   fortify  ggplot2   Prepare a data frame for ggplot2
       auc  precrec   Make a data frame with AUC scores

#### Examples of the `plot` function
The `plot` function outputs ROC and Precision-Recall curves
```{r, fig.width=7, fig.show='hold'}
# Show ROC and Precision-Recall plots
plot(sscurves)

# Show a Precision-Recall plot
plot(sscurves, "PRC")
```


#### Examples of the `autoplot` function
The `autoplot` function outputs ROC and Precision-Recall curves by using the 
`ggplot2` package.
```{r, fig.width=7, fig.show='hold'}
# The ggplot2 package is required 
library(ggplot2)

# Show ROC and Precision-Recall plots
autoplot(sscurves)

# Show a Precision-Recall plot
autoplot(sscurves, "PRC")
```


#### Examples of the `auc` function
The `auc` function outputs a data frame with the AUC (Area Under the Curve)
scores.
```{r}
# Get a data frame with AUC scores
aucs <- auc(sscurves)

# Use knitr::kable to display the result in a table format
knitr::kable(aucs)

# Get AUCs of Precision-Recall
aucs_prc <- subset(aucs, curvetypes == "PRC")
knitr::kable(aucs_prc)

```

## 2. Data preparation
The `precrec` package provides four functions for data preparation.

Function           Description 
------------------ -----------------------------------------------------------
       join_scores Join scores of multiple models into a list
       join_labels Join observed labels of multiple test datasets into a list
            mmdata Reformat input data for performance evaluation calculation
create_sim_samples Create random samples for simulations

### Examples of the `join_scores` function
The `join_scores` function combines multiple score datasets.
```{r}
s1 <- c(1, 2, 3, 4)
s2 <- c(5, 6, 7, 8)
s3 <- matrix(1:8, 4, 2)

# Join two score vectors
scores1 <- join_scores(s1, s2)

# Join two vectors and a matrix
scores2 <- join_scores(s1, s2, s3)
```

### Examples of the `join_labels` function
The `join_labels` function combines multiple score datasets.
```{r}
l1 <- c(1, 0, 1, 1)
l2 <- c(1, 0, 1, 1)
l3 <- c(1, 0, 1, 0)

# Join two label vectors
labels1 <- join_labels(l1, l2)
labels2 <- join_labels(l1, l3)
```

### Examples of the `mmdata` function
The `mmdata` function makes an input dataset for the `evalmod` function.
```{r}
# Create an input dataset with two score vectors and one label vector
msmdat <- mmdata(scores1, labels1)

# Specify dataset IDs
smmdat <- mmdata(scores1, labels2, dsids = c(1, 2))

# Specify model names and dataset IDs
mmmdat <- mmdata(scores1, labels2, modnames = c("mod1", "mod2"), dsids = c(1, 2))
```

### Examples of the `create_sim_samples` function
The `create_sim_samples` function is useful to make a random sample dataset 
with different performance levels.

Level name  Description 
----------- ---------------------
     random Random
    poor_er Poor early retrieval
    good_er Good early retrieval
      excel Excellent
       perf Perfect
        all All of the above

```{r}
# A dataset with 10 positives and 10 negatives for the random performance level
samps1 <- create_sim_samples(1, 10, 10, "random")

#  A dataset for five different performance levels
samps2 <- create_sim_samples(1, 10, 10, "all")

# A dataset with 20 samples for the good early retrieval performance level
samps3 <- create_sim_samples(20, 10, 10, "good_er")

# A dataset with 20 samples for five different performance levels
samps4 <- create_sim_samples(20, 10, 10, "all")
```

## 3. Performance evaluation of multiple models
The `evalmod` function calculate performance evaluation for multiple models
when multiple model names are specified with the `mmdata` or the `evalmod` 
function.

### Data preparation
There are several ways to create a dataset with the `mmdata` function 
for multiple models.
```{r}
# Use a list with multiple score vectors and a list with a single label vector
msmdat1 <- mmdata(scores1, labels1)

# Explicitly specify model names
msmdat2 <- mmdata(scores1, labels1, modnames = c("mod1", "mod2"))

# Use a sample dataset created by the create_sim_samples function
msmdat3 <- mmdata(samps2[["scores"]], samps2[["labels"]], modnames = samps2[["modnames"]])
```

### ROC and Precision-Recall calculations
The `evalmod` function automatically detects multiple models.
```{r}
# Calculate ROC and Precision-Recall curves for multiple models
mscurves <- evalmod(msmdat3)
```

### S3 generics
All the five S3 generics offered by this package are also effective for the S3 object generated by this approach.
```{r, fig.width=7, fig.show='hold'}
# Show ROC and Precision-Recall curves with the ggplot2 package
autoplot(mscurves)
```


## 4. Performance evaluation of multiple test datasets
The `evalmod` function calculate performance evaluation for multiple 
test datasets when different test dataset IDs are specified with the `mmdata` 
or the `evalmod` function.

### Data preparation
There are several ways to create a dataset with the `mmdata` function 
for multiple test datasets.
```{r}
# Specify test dataset IDs names
smmdat1 <- mmdata(scores1, labels2, dsids = c(1,2))

# Use a sample dataset created by the create_sim_samples function
smmdat2 <- mmdata(samps3[["scores"]], samps3[["labels"]], dsids = samps3[["dsids"]])
```

### ROC and Precision-Recall calculations
The `evalmod` function automatically detects multiple test datasets.
```{r}
# Calculate curves for multiple test datasets and keep all the curves
smcurves <- evalmod(smmdat2, raw_curves = TRUE)
```

### S3 generics
All the five S3 generics offered by this package are also effective for the S3 object generated by this approach.
```{r, fig.width=7, fig.show='hold'}
# Show an average Precision-Recall curve with the 95% confidence bounds
autoplot(smcurves, "PRC")

# Show raw Precision-Recall curves
autoplot(smcurves, "PRC", raw_curves = TRUE)
```


## 5. Evaluation of multiple models and multiple test datasets
The `evalmod` function calculate performance evaluation for multiple models and
multiple test datasets when different model names and test dataset IDs are specified with the `mmdata` or the `evalmod` function.

### Data preparation
There are several ways to create a dataset with the `mmdata` function 
for multiple models and multiple datasets.
```{r}
# Specify model names and test dataset IDs names
mmmdat1 <- mmdata(scores1, labels2, modnames= c("mod1", "mod2"), dsids = c(1, 2))

# Use a sample dataset created by the create_sim_samples function
mmmdat2 <- mmdata(samps4[["scores"]], samps4[["labels"]], 
                  modnames = samps4[["modnames"]], dsids = samps4[["dsids"]])
```

### ROC and Precision-Recall calculations
The `evalmod` function automatically detects multiple models and multiple test datasets.
```{r}
# Calculate curves for multiple models and multiple test datasets
mmcurves <- evalmod(mmmdat2)
```

### S3 generics
All the five S3 generics offered by this package are also effective for the S3 object generated by this approach.
```{r, fig.width=7, fig.show='hold'}
# Show average Precision-Recall curves
autoplot(mmcurves, "PRC")

# Show average Precision-Recall curves with the 95% confidence bounds
autoplot(mmcurves, "PRC", show_cb = TRUE)
```


## 6. Simulation with balanced and imbalanced datasets
It is easy to simulate various scenarios, such as balanced vs. imbalanced 
datasets, by using the `evalmod` and `create_sim_samples` functions.

### Data preparation
```{r}
# Balanced dataset
samps5 <- create_sim_samples(100, 100, 100, "all")
simmdat1 <- mmdata(samps5[["scores"]], samps5[["labels"]], 
                   modnames = samps5[["modnames"]], dsids = samps5[["dsids"]])

# Imbalanced dataset
samps6 <- create_sim_samples(100, 25, 100, "all")
simmdat2 <- mmdata(samps6[["scores"]], samps6[["labels"]], 
                   modnames = samps6[["modnames"]], dsids = samps6[["dsids"]])

```

### ROC and Precision-Recall calculations
The `evalmod` function automatically detects multiple models and multiple test datasets.
```{r}
# Balanced dataset
simcurves1 <- evalmod(simmdat1)

# Imbalanced dataset
simcurves2 <- evalmod(simmdat2)
```

### Balanced vs. imbalanced datasets
ROC plots are unchanged between balanced and imbalanced datasets, whereas
Precision-Recall plots show a clear difference between them. See our 
[article](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0118432) or [website](https://classeval.wordpress.com) for potential pitfalls by
using ROC plots with imbalanced datasets.
```{r, fig.width=7, fig.show='hold'}
# Balanced dataset
autoplot(simcurves1)

# Imbalanced dataset
autoplot(simcurves2)
```


## 7. Basic performance evaluation measures
The `evalmod` function also calculates basic evaluation measures - error, 
accuracy, specificity, sensitivity, and precision.

Measure     Description 
----------- --------------------------
      error Error rate
   accuracy Accuracy
specificity Specificity, TNR, 1 - FPR
sensitivity Sensitivity, TPR, Recall
  precision Precision, PPV
        
### Basic measure calculations
The `mode = "basic"` option makes the `evalmod` function calculate the basic 
evaluation measures instead of performing ROC and Precision-Recall calculations.
```{r}
# Calculate basic evaluation measures
mmpoins <- evalmod(mmmdat2, mode = "basic")
```

### S3 generics
All the five S3 generics except for the `auc` function are also effective 
for the S3 object generated by this approach.
```{r, fig.width=7, fig.show='hold'}
# Show normalized threshold values vs. error rate and accuracy
autoplot(mmpoins, c("error", "accuracy"))

# Show normalized threshold values vs. specificity and sensitivity
autoplot(mmpoins, c("specificity", "sensitivity"))

# Show normalized threshold values vs. precision
autoplot(mmpoins, "precision")
```


## 8. External links
See our website - [Classifier evaluation with imbalanced datasets](https://classeval.wordpress.com/) - for useful tips for performance evaluation on binary classifiers. In addition, we have summarized potential pitfalls of ROC plots with imbalanced datasets. See our paper - [The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0118432) - for more details.