Revision 23ea3229abe72a5f23dcf3a4cfcd3478d744b536 authored by Dominique Makowski on 20 June 2019, 11:50:03 UTC, committed by cran-robot on 20 June 2019, 11:50:03 UTC
1 parent 9985109
Raw File
title: "Bayes Factors"
    toc: true
    fig_width: 10.08
    fig_height: 6
    toc: true
    fig_width: 10.08
    fig_height: 6
tags: [r, bayesian, bayes factors]
vignette: >
  %\VignetteIndexEntry{Bayes Factors}
  chunk_output_type: console
bibliography: bibliography.bib

This vignette can be referred to by citing the package:

- Makowski, D., Ben-Shachar M. S. \& L├╝decke, D. (2019). *Understand and Describe Bayesian Models and Posterior Distributions using bayestestR*. Available from DOI: [10.5281/zenodo.2556486](


```{r setup, include=FALSE}
options(knitr.kable.NA = '')
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = ">")
options(digits = 2)

The adoption of the Bayesian framework for applied statistics, especially in social or psychological sciences, seems to be developing in two distinct directions. One of the key topics marking their separation is their opinion about the **Bayes factor**. In short, some authors (e.g., the "Amsterdam school", led by [Wagenmakers]( advocate its use and emphasize its qualities as a statistical index, while others point to its limits and prefer, instead, the precise description of posterior distributions (using [CIs](, [ROPEs](, etc.). 

**bayestestR** does not take a side in this debate, rather offering tools to help you in whatever analysis you want to achieve. Instead, it strongly supports the notion of an *informed choice:* **discover the methods, try them, understand them, learn about them, and decide for yourself**.

Having said that, here's an introduction to Bayes factors :)

# Bayes Factors

**Bayes factors (BFs) are indices of *relative* evidence of one "model" over another**, which are used in Bayesian inference as alternatives to classical (frequentist) hypothesis testing indices (such as $p-values$). In the Bayesian framework, a Bayes factor can also be thought of as the quantity by which some prior beliefs about the relative odds of two models are updated in light of the observed data.

According to Bayes' theorem:

P(M|D) = \frac{P(D|M)P(M)}{P(D)}

Then by comparing two models, we get:

\frac{P(M_1|D)}{P(M_2|D)} = \frac{P(D|M_1)}{P(D|M_2)} \times \frac{P(M_1)}{P(M_2)}
Where the middle term is the Bayes factor:
These are usually computed as the ratio of marginal likelihoods of two competing hypotheses / models, but as we can see from the equation above, they can also be computed by dividing the posterior odds by the prior odds. Importantly, **Bayes factors cover a wide range of indices and applications**, and come in different flavors.

## Comparing models

Generally speaking, Bayes factors are used to compare models and answer the question:

> **Under which model is the the observed data more probable?**

In other words, which model is more likely to have produced the observed data? This is usually done by computing the marginal likelihoods of two models. In such a case, the Bayes factor is a measure of the relative evidence of one of the compared models over the other.

### Bayesian models (`brms` and `rstanarm`)

**Note: In order to compute Bayes factors for models, non-default arguments must be added upon fitting:**
  - `brmsfit` models **must** have been fitted with `save_all_pars = TRUE`
  - `stanreg` models **must** have been fitted with a defined `diagnostic_file`.
Let's first fit 5 Bayesian regressions with `brms` to predict `Sepal.Length`:

```{r brms_disp, eval=FALSE, message=FALSE, warning=FALSE}

m0 <- brm(Sepal.Length ~ 1, data = iris, save_all_pars = TRUE)
m1 <- brm(Sepal.Length ~ Petal.Length, data = iris, save_all_pars = TRUE)
m2 <- brm(Sepal.Length ~ Species, data = iris, save_all_pars = TRUE)
m3 <- brm(Sepal.Length ~ Species + Petal.Length, data = iris, save_all_pars = TRUE)
m4 <- brm(Sepal.Length ~ Species * Petal.Length, data = iris, save_all_pars = TRUE)

We can now compare these models with the `bayesfactor_models()` function, using the `denominator` argument to specify which model all models will be compared against (in this case, the intercept-only model):

```{r brms_models_disp, eval=FALSE}
comparison <- bayesfactor_models(m1, m2, m3, m4, denominator = m0)

```{r brms_models_print, echo=FALSE, message=FALSE, warning=FALSE}
# dput(comparison)
comparison <- structure(
    Model = c("Petal.Length", "Species", "Species + Petal.Length", "Species * Petal.Length", "1"),
    BF = c(3.44736e+44, 5.628679e+29, 7.121386e+55, 9.149948e+55, 1)
  class = c("bayesfactor_models", "see_bayesfactor_models","data.frame"),
  row.names = c(NA, -5L),
  denominator = 5L,
  BF_method = "marginal likelihoods (bridgesampling)"

We can see that the full model is the best model - with $BF_{\text{m0}}=9\times 10^{55}$ compared to the null (intercept only).

We can also change the reference model to the main effect model:

```{r update_models1, message=FALSE, warning=FALSE}
update(comparison, reference = 3)

As we can see, though the full model is the best, there is hardly any evidence that it is preferable to the main effects model.

We can also change the reference model to the `Species` model:

```{r update_models2, message=FALSE, warning=FALSE}
update(comparison, reference = 2)

Notice that in the Bayesian framework the compared models *do not* need to be nested models, as happened here when we compared the `Petal.Length`-only model to the `Species`-only model (something that cannot be done in the frequentists framework, where compared models must be nested in one another).

> **NOTE:** In order to correctly and precisely estimate Bayes Factors, you always need the 4 P's: **P**roper **P**riors <sup>([1](, [2](, [3](</sup>, and a **P**lentiful **P**osterior <sup>([4](</sup>.

### The BIC approximation for Frequentist Models

It is also possible to compute Bayes factors for frequentist models. This is done by comparing BIC measures, allowing a Bayesian comparison of non-nested frequentist models [@wagenmakers2007practical]. Let's try it out on some **linear mixed models**:

```{r lme4_models, message=FALSE, warning=FALSE}

m0 <- lmer(Sepal.Length ~ (1 | Species), data = iris)
m1 <- lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris)
m2 <- lmer(Sepal.Length ~ Petal.Length + (Petal.Length | Species), data = iris)
m3 <- lmer(Sepal.Length ~ Petal.Length + Petal.Width + (Petal.Length | Species), data = iris)
m4 <- lmer(Sepal.Length ~ Petal.Length * Petal.Width + (Petal.Length | Species), data = iris)

bayesfactor_models(m1, m2, m3, m4, denominator = m0)

## Inclusion Bayes factors via Bayesian model averaging

Inclusion Bayes factors answer the question:

> **Are the observed data more probable under models with a particular effect, than they are under models without that particular effect?**

In other words, on average - are models with effect $X$ more likely to have produce the observed data than models without effect $X$?

Lets use the `brms` example from above:

```{r inclusion_brms, message=FALSE, warning=FALSE, eval=TRUE}

If we examine the interaction term's inclusion Bayes factor, we can see that across all 5 models, a model with the interaction term (`Species:Petal.Length`) is 5 times more likely than a model without the interaction term.

We can also compare only matched models - i.e., a model without effect $A$ will only be compared to models *with* effect $A$, but not with models with higher-level interaction (see explanation for why you might want to do this [here](

```{r inclusion_brms2, message=FALSE, warning=FALSE, eval=TRUE}
bayesfactor_inclusion(comparison, match_models = TRUE)

### Comparison with JASP

`bayesfactor_inclusion()` is meant to provide Bayes Factors across model averages, similar to JASP's *Effects* option. Let's compare the two:

#### Compared across all models

```{r JASP_all, message=FALSE, warning=FALSE, eval=TRUE}
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
BF_ToothGrowth <- anovaBF(len ~ dose*supp, ToothGrowth)


```{r JASP_all_fig, echo=FALSE, message=FALSE, warning=FALSE}

#### Compared across matched models

```{r JASP_matched, message=FALSE, warning=FALSE, eval=TRUE}
bayesfactor_inclusion(BF_ToothGrowth, match_models = TRUE)

```{r JASP_matched_fig, echo=FALSE, message=FALSE, warning=FALSE}

#### With Nuisance Effects

We'll add `dose` to the null model in JASP, and do the same in `R`:

```{r JASP_Nuisance, message=FALSE, warning=FALSE, eval=TRUE}
BF_ToothGrowth_against_dose <- BF_ToothGrowth[3:4]/BF_ToothGrowth[2] # OR: 
# update(bayesfactor_models(BF_ToothGrowth), subset = c(4,5), reference = 3)


```{r JASP_Nuisance_fig, echo=FALSE, message=FALSE, warning=FALSE}

## Savage-Dickey density ratio Bayes factor

The ***Savage-Dickey density ratio*** can be used to answer the question:

> **Given the observed data, is the null more, or less likely?**

This is done by comparing the density of the null value between the prior and posterior distributions, and is an approximation of a Bayes factor comparing the provided against the (point) null model:

> "[...] the Bayes factor for $H_0$ versus $H_1$ could be obtained by analytically integrating out the model parameter $\theta$. However, the Bayes factor may likewise be obtained by only considering $H_1$, and dividing the height of the posterior for $\theta$ by the height of the prior for $\theta$, at the point of interest." [@wagenmakers2010bayesian]

Let's use the Students' Sleep data, and try and answer the question: ***given the observed data, is it more or less likely that the drug (the variable `group`) has no effect on the numbers of hours of extra sleep (variable `extra`)?***

```{r sleep_boxplot, echo=FALSE, message=FALSE, warning=FALSE}

ggplot(sleep, aes(x = group, y = extra, fill= group)) +
  geom_boxplot() +

The **bloxplot** suggests that the 2nd group has a higher number of hours of extra sleep. By how much? Let's fit a simple [Bayesian linear model](

```{r rstanarm_disp, eval=FALSE, message=FALSE, warning=FALSE}
model <- stan_glm(extra ~ group, data = sleep)

```{r rstanarm_fit, echo=FALSE, message=FALSE, warning=FALSE}
junk <- capture.output(model <- stan_glm(extra ~ group, data = sleep))

We can use `` on this model to extract the posterior distribution related to the effect of `group2`, and `get_priors()` from the [**insight** package]( to see what prior distribution was used:

```{r prior_n_post, message=FALSE, warning=FALSE, results='hide'}
posterior <-$group2

```{r prior_table, echo=FALSE, message=FALSE, warning=FALSE}

For the `group2` parameter, the prior that was used was a **normal distribution** of **mean** (location) `0` and **SD** (scale) `5.044799`. We can simulate this prior distribution as follows:

```{r message=FALSE, warning=FALSE}
prior <- distribution_normal(length(posterior), mean = 0, sd = 5.044799)

We can now plot both the prior and posterior distribution:

```{r prior_n_post_plot, echo=FALSE, message=FALSE, warning=FALSE}

# Using "see"
bfsd <- bayesfactor_savagedickey(
  data.frame(group2 = posterior),
  data.frame(group2 = prior)

plot(bfsd) +
  theme(legend.position = c(0.2, 0.8))


Looking at the distributions, we can see that the posterior is centred at `r round(median(posterior), 2)`. But this does not mean that an effect of 0 is necessarily less likely. To test that, we will use `bayesfactor_savagedickey()`.

```{r savagedickey, message=FALSE, warning=FALSE}
test_group2 <- bayesfactor_savagedickey(posterior = posterior, prior = prior)

This BF indicates that **an effect of 0 (the point-null effect model) is `r round(test_group2$BF[1],2)` less likely given the data**. Or, when interpreting as a Bayes factor for $H_0$ versus $H_1$, we can say that the observed data is 1/`r round(test_group2$BF[1],2)` = `r round(1/test_group2$BF[1],2)` times more probable under the null than under the model with the effect! Thus, although the center of distribution has shifted away from the null, it is still quite dense around the null.

Note that **interpretation guides** for Bayes factors can be found [**here**]( 

### Directional test

We can also conduct a directional test (a "one sided" or "one tailed" test) if we have a prior hypotheses about the direction of the effect. This is done by setting an order restriction on the prior and posterior distributions [@morey2014simple; @morey_2015_blog].

```{r prior_n_post_plot_one_sided, echo=FALSE, message=FALSE, warning=FALSE}

# Using "see"
bfsd <- bayesfactor_savagedickey(
  data.frame(group2 = posterior),
  data.frame(group2 = prior),
  direction = ">"

plot(bfsd) +
  theme(legend.position = c(0.8,0.8))


```{r savagedickey_one_sided, message=FALSE, warning=FALSE}
test_group2_right <- bayesfactor_savagedickey(posterior = posterior, prior = prior, direction = ">")

As we can see, given that we have an *a priori* assumption about the direction of the effect (*that the effect is positive*), **the presence of an effect has become `r round(test_group2_right$BF[1],2)` times more likely than the absence of an effect**, given the observed data (or that the data is `r round(test_group2_right$BF[1],2)` time more probable under $H_1$ than $H_0$). This indicates that, given the observed data, the posterior mass has shifted away from the null value, giving some evidence against the null (note that a Bayes factor of `r round(test_group2_right$BF[1],2)` is still considered quite [weak evidence](

### Testing all model parameters

Alternatively, we could also pass our model directly as-is to `bayesfactor_savagedickey()` to simultaneously test all of the model's parameters:


### Testing Contrasts (with `emmeans`)

We can also use `bayesfactor_savagedickey()` together with [**emmeans**](, allowing us to [test Bayesian contrasts](

group_diff <- pairs(emmeans(model, ~ group))

# pass the original model via prior
bayesfactor_savagedickey(group_diff, prior = model)

# References
back to top