Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

swh:1:snp:2c68a6c5a8af2f06ac2c0225927f25b54fd1f9d0
  • Code
  • Branches (24)
  • Releases (0)
    • Branches
    • Releases
    • HEAD
    • refs/heads/master
    • refs/tags/0.1.0
    • refs/tags/0.10.0
    • refs/tags/0.10.5
    • refs/tags/0.11.0
    • refs/tags/0.11.5
    • refs/tags/0.12.1
    • refs/tags/0.13.0
    • refs/tags/0.2.0
    • refs/tags/0.2.2
    • refs/tags/0.2.5
    • refs/tags/0.3.0
    • refs/tags/0.4.0
    • refs/tags/0.5.0
    • refs/tags/0.5.1
    • refs/tags/0.5.2
    • refs/tags/0.5.3
    • refs/tags/0.6.0
    • refs/tags/0.7.0
    • refs/tags/0.7.2
    • refs/tags/0.7.5
    • refs/tags/0.8.0
    • refs/tags/0.8.2
    • refs/tags/0.9.0
    No releases to show
  • 1f0a22c
  • /
  • inst
  • /
  • doc
  • /
  • example2.Rmd
Raw File Download
Permalinks

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • content
  • directory
  • revision
  • snapshot
content badge Iframe embedding
swh:1:cnt:ed7a60673c968395aaaf7bd4f3b06077d893e4e1
directory badge Iframe embedding
swh:1:dir:cfa272b2679423ca4a494b7b41d15be0683d72d7
revision badge
swh:1:rev:3da49db3cf0eea4d2c5eba241ddb5470cd7dd929
snapshot badge
swh:1:snp:2c68a6c5a8af2f06ac2c0225927f25b54fd1f9d0
Citations

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • content
  • directory
  • revision
  • snapshot
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Tip revision: 3da49db3cf0eea4d2c5eba241ddb5470cd7dd929 authored by Dominique Makowski on 26 January 2021, 16:40:03 UTC
version 0.8.2
Tip revision: 3da49db
example2.Rmd
---
title: "2. Confirmation of Bayesian skills"
output: 
  github_document:
    toc: true
    fig_width: 10.08
    fig_height: 6
  rmarkdown::html_vignette:
    toc: true
    fig_width: 10.08
    fig_height: 6
tags: [r, bayesian, posterior, test]
vignette: >
  \usepackage[utf8]{inputenc}
  %\VignetteIndexEntry{Example 2: Confirmation of Bayesian skills}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
bibliography: bibliography.bib
csl: apa.csl
---

This vignette can be referred to by citing the package:

- Makowski, D., Ben-Shachar, M. S., \& Lüdecke, D. (2019). *bayestestR: Describing Effects and their Uncertainty, Existence and Significance within the Bayesian Framework*. Journal of Open Source Software, 4(40), 1541. https://doi.org/10.21105/joss.01541

---

```{r message=FALSE, warning=FALSE, include=FALSE}
if (!requireNamespace("see", quietly = TRUE) ||
      !requireNamespace("dplyr", quietly = TRUE) ||
      !requireNamespace("ggplot2", quietly = TRUE) ||
      !requireNamespace("performance", quietly = TRUE) ||
      !requireNamespace("BayesFactor", quietly = TRUE) ||
      !requireNamespace("rstanarm", quietly = TRUE)) {
  knitr::opts_chunk$set(eval = FALSE)
}

data(iris)
library(knitr)
library(bayestestR)
options(knitr.kable.NA = '')
knitr::opts_chunk$set(comment=">")
knitr::opts_chunk$set(dpi=150)
options(digits=2)

set.seed(333)
```


Now that [**describing and understanding posterior distributions**](https://easystats.github.io/bayestestR/articles/example1.html) of linear regressions has no secrets for you, we will take one step back and study some simpler models: **correlations** and ***t*-tests**.

But before we do that, let us take a moment to remind ourselves and appreciate the fact that **all basic statistical pocedures** such as correlations, *t*-tests, ANOVAs or Chisquare tests ***are* linear regressions** (we strongly recommend  [this excellent demonstration](https://lindeloev.github.io/tests-as-linear/)). Nevertheless, these simple models will be the occasion to introduce a more complex index, such as the **Bayes factor**.

## Correlations

### Frequentist version 

Let us start, again, with a **frequentist correlation** between two continuous variables, the **width** and the **length** of the sepals of some flowers. The data is available in R as the `iris`  dataset (the same that was used in the [previous tutorial](https://easystats.github.io/bayestestR/articles/example1.html)). 

We will compute a Pearson's correlation test, store the results in an object called `result`, then display it:

```{r message=FALSE, warning=FALSE}
result <- cor.test(iris$Sepal.Width, iris$Sepal.Length)
result
```

As you can see in the output, the test that we did actually compared two hypotheses: the **null hypothesis** (*h0*; no correlation) with the **alternative hypothesis** (*h1*; a non-null correlation). Based on the *p*-value, the null hypothesis cannot be rejected: the correlation between the two variables is **negative but not significant** (r = -.12, p > .05).

### Bayesian correlation 

To compute a Bayesian correlation test, we will need the [`BayesFactor`](https://richarddmorey.github.io/BayesFactor/) package (you can install it by running `install.packages("BayesFactor")`). We can then load this package, compute the correlation using the `correlationBF()` function and store the results in a similar fashion. 
 
```{r message=FALSE, warning=FALSE, results='hide'}
library(BayesFactor)
result <- correlationBF(iris$Sepal.Width, iris$Sepal.Length)
```

Now, let us run our `describe_posterior()` function on that:

```{r message=FALSE, warning=FALSE, eval=FALSE}
describe_posterior(result)
```
```{r echo=FALSE}
structure(list(Parameter = "rho", Median = -0.114149129692488, 
    CI = 89, CI_low = -0.240766308855643, CI_high = 0.00794997655649642, 
    pd = 91.6, ROPE_CI = 89, ROPE_low = -0.1, ROPE_high = 0.1, 
    ROPE_Percentage = 42.0949171581017, BF = 0.509017511647702, 
    Prior_Distribution = "cauchy", Prior_Location = 0, Prior_Scale = 0.333333333333333), row.names = 1L, class = "data.frame")
```


We see again many things here, but the important indices for now are the **median** of the posterior distribution, `-.11`. This is (again) quite close to the frequentist correlation. We could, as previously, describe the [**credible interval**](https://easystats.github.io/bayestestR/articles/credible_interval.html), the [**pd**](https://easystats.github.io/bayestestR/articles/probability_of_direction.html) or the [**ROPE percentage**](https://easystats.github.io/bayestestR/articles/region_of_practical_equivalence.html), but we will focus here on another index provided by the Bayesian framework, the **Bayes factor (BF)**.

### Bayes factor (BF)

We said previously that a correlation test actually compares two hypotheses, a null (absence of effect) with an altnernative one (presence of an effect). The [**Bayes factor (BF)**](https://easystats.github.io/bayestestR/articles/bayes_factors.html) allows the same comparison and determines **under which of two models the observed data are more probable**: a model with the effect of interest, and a null model without the effect of interest. We can use `bayesfactor()` to specifically compute the Bayes factor comparing those models:

```{r message=FALSE, warning=FALSE}
bayesfactor(result)
```

We got a *BF* of `0.51`. What does it mean?

Bayes factors are **continuous measures of relative evidence**, with a Bayes factor greater than 1 giving evidence in favour of one of the models (often referred to as *the numerator*), and a Bayes factor smaller than 1 giving evidence in favour of the other model (*the denominator*). 

> **Yes, you heard things right, evidence in favour of the null!**

That's one of the reason why the Bayesian framework is sometimes considered as superior to the frequentist framework. Remember from your stats lessons, that the ***p*-value can only be used to reject *h0***, but not *accept* it. With the **Bayes factor**, you can measure **evidence against - and in favour of - the null**. 

BFs representing evidence for the alternative against the null can be reversed using $BF_{01}=1/BF_{10}$ (the *01* and *10* correspond to *h0* against *h1* and *h1* against *h0*, respectively) to provide evidence of the null againtt the alternative. This improves human readability in cases where the BF of the alternative against the null is smaller than 1 (i.e., in support of the null).

In our case, `BF = 1/0.51 = 2`, indicates that the data are **2 times more probable under the null compared to the alternative hypothesis**, which, though favouring the null, is considered only [anecdotal evidence against the null](https://easystats.github.io/effectsize/reference/interpret_bf.html).

We can thus conclude that there is **anecdotal evidence in favour of an absence of correlation between the two variables (r<sub>median</sub> = 0.11, BF = 0.51)**, which is a much more informative statement that what we can do with frequentist statistics.

**And that's not all!**

### Visualise the Bayes factor

In general, **pie charts are an absolute no-go in data visualisation**, as our brain's perceptive system heavily distorts the information presented in such way. Nevertheless, there is one exeption: pizza charts.

It is an intuitive way of interpreting the strength of evidence provided by BFs as an amount of surprise. 


```{r echo=FALSE, fig.cap="Wagenmakers' pizza poking analogy. From the great 'www.bayesianspectacles.org' blog.", fig.align='center', out.width="80%"}
knitr::include_graphics("https://github.com/easystats/easystats/raw/master/man/figures/bayestestR/LetsPokeAPizza.jpg")
```


Such "pizza plots" can be directly created through the [`see`](https://github.com/easystats/see) visualisation companion package for easystats (you can install it by running `install.packages("see")`):

```{r message=FALSE, warning=FALSE}
library(see)

plot(bayesfactor(result)) +
  scale_fill_pizza()
```

So, after seeing this pizza, how much would you be suprised by the outcome of a blinded poke? 


## *t*-tests

***"I know that I know nothing, and especially not if *versicolor* and *virginica* differ in terms of Sepal.Width"*, famously said Socrates**. Time to finally answer this answer this crucial question!

### Versicolor *vs.* virginica

Bayesian *t*-tests can be performed in a very similar way to correlations. As we are particularly interested in two levels of the `Species` factor, *versicolor* and *virginica*. We will start by filtering out from `iris` the non-relevant observations corresponding to the *setosa* specie, and we will then visualise the observations and the distribution of the `Sepal.Width` variable. 

```{r message=FALSE, warning=FALSE}
library(dplyr)
library(ggplot2)

# Select only two relevant species
data <- iris %>% 
  filter(Species != "setosa") %>% 
  droplevels()

# Visualise distributions and observations
data %>% 
  ggplot(aes(x = Species, y = Sepal.Width, fill = Species)) +
  geom_violindot(fill_dots = "black", size_dots = 1) +
  scale_fill_material() +
  theme_modern()
```

It *seems* (visually) that *virgnica* flowers have, on average, a slightly higer width of sepals. Let's assess this difference statistically by using the `ttestBF` in the `BayesFactor` package.

### Compute the Bayesian *t*-test

```{r message=FALSE, warning=FALSE}
result <- BayesFactor::ttestBF(formula = Sepal.Width ~ Species, data = data)
describe_posterior(result)
```

From the indices, we can say that the difference of `Sepal.Width` between *virginica* and *versicolor* has a probability of **100% of being negative** [*from the pd and the sign of the median*] (median = -0.19, 89% CI [-0.29, -0.092]). The data provides a **strong evidence against the null hypothesis** (BF = 18).

Keep that in mind as we will see another way of investigating this question.


## Logistic Model

A hypothesis for which one uses a *t*-test can also be tested using a binomial model (*e.g.*, a **logistic model**). Indeed, it is possible to reformulate the following hypothesis, "*there is an important difference in this variable between the two groups*" by "*this variable is able to discriminate between (or classify) the two groups*". However, these models are much more powerful than a regular *t*-test.

In the case of the difference of `Sepal.Width` between *virginica* and *versicolor*, the question becomes, *how well can we classify the two species using only* `Sepal.Width`. 

### Fit the model

```{r message=FALSE, warning=FALSE, eval=FALSE}
library(rstanarm)

model <- stan_glm(Species ~ Sepal.Width, data = data, family = "binomial")
```
```{r message=FALSE, warning=FALSE, echo=FALSE}
library(rstanarm)

model <- stan_glm(Species ~ Sepal.Width, data = data, family = "binomial", refresh = 0)
```

### Visualise the model

Using the [`modelbased`](https://github.com/easystats/modelbased) package.


### Performance and Parameters

TO DO.

```{r message=FALSE, warning=FALSE}
library(performance)

model_performance(model)
```

```{r message=FALSE, warning=FALSE}
describe_posterior(model, test = c("pd", "ROPE", "BF"))
```



### Visualise the indices

TO DO.

```{r message=FALSE, warning=FALSE}
# plot(rope(result))
```


### Diagnostic Indices

About diagnostic indices such as Rhat and ESS.


back to top

Software Heritage — Copyright (C) 2015–2025, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Contact— JavaScript license information— Web API