Raw File
Tip revision: 601edcd8d1306ebea478657861c54fa9c69a52f3 authored by Dominique Makowski on 31 May 2021, 05:40 UTC
version 0.10.0
Tip revision: 601edcd
title: "Region of Practical Equivalence (ROPE)"
    toc: true
    fig_width: 10.08
    fig_height: 6
tags: [r, bayesian, posterior, test, rope, equivalence test]
vignette: >
  %\VignetteIndexEntry{Region of Practical Equivalence (ROPE)}
  chunk_output_type: console
bibliography: bibliography.bib
csl: apa.csl

This vignette can be referred to by citing the package:

- Makowski, D., Ben-Shachar, M. S., \& Lüdecke, D. (2019). *bayestestR: Describing Effects and their Uncertainty, Existence and Significance within the Bayesian Framework*. Journal of Open Source Software, 4(40), 1541.


```{r message=FALSE, warning=FALSE, include=FALSE}
if (!requireNamespace("rstanarm", quietly = TRUE) || !requireNamespace("see", quietly = TRUE)) {
  knitr::opts_chunk$set(eval = FALSE)

options(knitr.kable.NA = '')


# What is the *ROPE?*

Unlike a frequentist approach, Bayesian inference is not based on statistical
significance, where effects are tested against "zero". Indeed, the Bayesian
framework offers a probabilistic view of the parameters, allowing assessment of
the uncertainty related to them. Thus, rather than concluding that an effect is
present when it simply differs from zero, we would conclude that the probability
of being outside a specific range that can be considered as **"practically no
effect"** (*i.e.*, a negligible magnitude) is sufficient. This range is called
the **region of practical equivalence (ROPE)**.

Indeed, statistically, the probability of a posterior distribution being
different from 0 does not make much sense (the probability of it being different
from a single point being infinite). Therefore, the idea underlining ROPE is to
let the user define an area around the null value enclosing values that are
**equivalent to the null** value for practical purposes [@kruschke2010believe;
@kruschke2012time; @kruschke2014doing].

# Equivalence Test

The ROPE, being a region corresponding to a "null" hypothesis, is used for the
**equivalence test**, to test whether a parameter is **significant** (in the
sense of *important* enough to be cared about). This test is usually based on
the **"HDI+ROPE decision rule"** [@kruschke2014doing; @kruschke2018bayesian] to
check whether parameter values should be accepted or rejected against an
explicitly formulated "null hypothesis" (*i.e.*, a ROPE). In other words, it
checks the percentage of Credible Interval (CI) that is the null region (the
ROPE). If this percentage is sufficiently low, the null hypothesis is rejected.
If this percentage is sufficiently high, the null hypothesis is accepted.

# Credible interval in ROPE *vs* full posterior in ROPE

Using the ROPE and the HDI as Credible Interval, Kruschke (2018) suggests using
the percentage of the 95\% HDI that falls within the ROPE as a decision rule.
However, as the 89\% HDI [is considered a better
[@kruschke2014doing; @mcelreath2014rethinking; @mcelreath2018statistical],
`bayestestR` provides by default the percentage of the 89\% HDI that falls
within the ROPE.

However, [*simulation studies data*](
suggest that using the percentage of the full posterior distribution, instead of
a CI, might be more sensitive (especially do delineate highly significant
effects). Thus, we recommend that the user considers using the **full ROPE**
percentage (by setting `ci = 1`), which will return the portion of the entire
posterior distribution in the ROPE.

# What percentage in ROPE to accept or to reject?

If the HDI is completely outside the ROPE, the "null hypothesis" for this
parameter is "rejected". If the ROPE completely covers the HDI, *i.e.*, all most
credible values of a parameter are inside the region of practical equivalence,
the null hypothesis is accepted. Else, it’s unclear whether the null hypothesis
should be accepted or rejected.

If the **full ROPE** is used (*i.e.*, 100\% of the HDI), then the null
hypothesis is rejected or accepted if the percentage of the posterior within the
ROPE is smaller than to 2.5\% or greater than 97.5\%. Desirable results are low
proportions inside the ROPE  (the closer to zero the better).

# How to define the ROPE range?

Kruschke (2018) suggests that the ROPE could be set, by default, to a range from `-0.1` to `0.1` of a standardized parameter (negligible effect size according to Cohen, 1988).

- For **linear models (lm)**, this can be generalised to: $$[-0.1*SD_{y}, 0.1*SD_{y}]$$.
- For **logistic models**, the parameters expressed in log odds ratio can be converted to standardized difference through the formula: $$\pi/\sqrt{3}$$ (see [the **effectsize** package](, resulting in a range of `-0.18` to `-0.18`. For other models with binary outcome, it is strongly recommended to manually specify the rope argument. Currently, the same default is applied that for logistic models.
- For **t-tests**, the standard deviation of the response is used, similarly to
linear models (see above).
- For **correlations**, `-0.05, 0.05` is used, *i.e.*, half the value of a
negligible correlation as suggested by Cohen's (1988) rules of thumb.
- For all other models, `-0.1, 0.1` is used to determine the ROPE limits, but it
is strongly advised to specify it manually.

# Sensitivity to parameter's scale

It is important to consider **the unit (*i.e.*, the scale) of the predictors**
when using an index based on the ROPE, as the correct interpretation of the ROPE
as representing a region of practical equivalence to zero is dependent on the
scale of the predictors. Indeed, unlike other indices (such as the
the percentage in **ROPE** depend on the unit of its parameter. In other words,
as the ROPE represents a fixed portion of the response's scale, its proximity
with a coefficient depends on the scale of the coefficient itself.

For instance, if we consider a simple regression `growth ~ time`, modelling the
development of **Wookies babies**, a negligible change (the ROPE) is less than
**54 cm**. If our `time` variable is **expressed in days**, we will find that
the coefficient (representing the growth **by day**) is of about **10 cm** (*the
median of the posterior of the coefficient is 10*). Which we would consider as
**negligible**. However, if we decide to express the `time` variable **in
years**, the coefficient will be scaled by this transformation (as it will now
represent the growth **by year**). The coefficient will now be around **3550**
cm (`10 * 355`), which we would now consider as **significant**.

```{r message=FALSE, warning=FALSE, eval=FALSE}

data <- iris  # Use the iris data
model <- stan_glm(Sepal.Length ~ Sepal.Width, data=data)  # Fit model

```{r echo=FALSE, message=FALSE, warning=FALSE, comment=">"}

data <- iris  # Use the iris data
model <- stan_glm(Sepal.Length ~ Sepal.Width, data=data, refresh = 0)

```{r echo=TRUE, message=FALSE, warning=FALSE, comment=">"}
# Compute indices
pd <- p_direction(model)
percentage_in_rope <- rope(model, ci=1)

# Visualise the pd

# Visualise the percentage in ROPE

We can see that the *pd* and the percentage in ROPE of the linear relationship
between **Sepal.Length** and **Sepal.Width** are respectively of about `92.95%`
and `15.95%`, corresponding to an **uncertain** and **not significant** effect.
What happen if we scale our predictor?

```{r message=FALSE, warning=FALSE, eval=FALSE}
data$Sepal.Width_scaled <- data$Sepal.Width / 100  # Divide predictor by 100
model <- stan_glm(Sepal.Length ~ Sepal.Width_scaled, data=data)  # Fit model
```{r echo=FALSE, message=FALSE, warning=FALSE, comment=">"}
data$Sepal.Width_scaled <- data$Sepal.Width / 100
model <- stan_glm(Sepal.Length ~ Sepal.Width_scaled, data=data, refresh = 0)

```{r echo=TRUE, message=FALSE, warning=FALSE, comment=">"}
# Compute indices
pd <- p_direction(model)
percentage_in_rope <- rope(model, ci=1)

# Visualise the pd

# Visualise the percentage in ROPE

As you can see, by simply dividing the predictor by 100, we **drastically**
changed the conclusion related to the **percentage in ROPE** (which became very
close to `0`): the effect could now be **interpreted as being significant**.
Thus, we recommend paying close attention to the unit of the predictors when
selecting the ROPE range (*e.g.*, what coefficient would correspond to a small
effect?), and when reporting or reading ROPE results.

# Multicollinearity: Non-independent covariates

When **parameters show strong correlations**, *i.e.*, when covariates are not
independent, the joint parameter distributions may shift towards or away from
the ROPE. Collinearity invalidates ROPE and hypothesis testing based on
univariate marginals, as the probabilities are conditional on independence. Most
problematic are parameters that only have partial overlap with the ROPE region.
In case of collinearity, the (joint) distributions  of these parameters may
either get an increased or decreased ROPE, which means that inferences based on
ROPE are inappropriate [@kruschke2014doing].

The `equivalence_test()` and `rope()` functions perform a simple check for
pairwise correlations between parameters, but as there can be collinearity
between more than two variables, a first step to check the assumptions of this
hypothesis testing is to look at different pair plots. An even more
sophisticated check is the projection predictive variable selection

back to top