Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

  • 7f03a19
  • /
  • inst
  • /
  • doc
  • /
  • bayestestR.Rmd
Raw File Download

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • content
  • directory
content badge Iframe embedding
swh:1:cnt:3e3c55024557e8b678c387cfea184a19c99ef587
directory badge Iframe embedding
swh:1:dir:0602c377bfda199c198ed559ff1f70b0ebcb7ceb

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • content
  • directory
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
bayestestR.Rmd
---
title: "Get Started with Bayesian Analysis"
output: 
  github_document:
    toc: true
    fig_width: 10.08
    fig_height: 6
  rmarkdown::html_vignette:
    toc: true
    fig_width: 10.08
    fig_height: 6
tags: [r, bayesian, posterior, test]
vignette: >
  %\VignetteIndexEntry{Get Started with Bayesian Analysis}
  \usepackage[utf8]{inputenc}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
bibliography: bibliography.bib
---



## Why use the Bayesian Framework?

Reasons to prefer this approach are **reliability**, **accuracy** (in noisy data and small samples), the possibility of introducing **prior knowledge** into the analysis and, critically, **results intuitiveness** and their **straightforward interpretation** [@andrews2013prior; @etz2016bayesian; @kruschke2010believe; @kruschke2012time; @wagenmakers2018bayesian]. 

In general, the frequentist approach has been associated with the focus on null hypothesis testing, and the misuse of *p* values has been shown to critically contribute to the reproducibility crisis of psychological science [@chambers2014instead; @szucs2016empirical]. There is a general agreement that the generalization of the Bayesian approach is one way of overcoming these issues [@benjamin2018redefine; @etz2016bayesian].

Once we agreed that the Bayesian framework is the right way to go, you might wonder *what* is the Bayesian framework. 

**What's all the fuss about?**

## What is the Bayesian Framework?

Adopting the Bayesian framework is more of a shift in the paradigm than a change in the methodology. Indeed, all the common statistical models (t-tests, correlations, ANOVAs, regressions, ...) can be made using the Bayesian framework. One of the core difference is that in the **frequentist view** (the "classic" statistics, with *p* and *t* values, as well as some weird *degrees of freedom*), **the effects are fixed** (but unknown) and **data are random**. On the contrary the Bayesian inference process computes the **probability** of different effect values *given the observed data*. Instead of having one estimated value of the "true effect", this probabilistic approach gives a distribution of values, called the **"posterior" distribution**. 

Bayesian’s uncertainty can be summarized, for instance, by giving the **median** of the distribution, as well as a range of values on the posterior distribution that includes the 95% most probable values (the 95\% ***Credible* Interval**). To illustrate the difference of interpretation, the Bayesian framework allows to say *"given the observed data, the effect has 95% probability of falling within this range"*, while the Frequentist less straightforward alternative (the 95\% ***Confidence* Interval**) would be "*there is a 95\% probability that when computing a confidence interval from data of this sort, the effect falls within this range*". 


In other words, omitting the maths behind it, we can say that:

- The frequentist bloke tries to estimate "the real effect". For instance, the "real" value of the correlation between *x* and *y*. Hence, frequentist models return a "**point-estimate**" (*i.e.*, a single value) of the "real" correlation (*e.g.*, r = 0.42) estimated under a number of obscure assumptions (at a minimum, considering that the data is sampled at random from a "parent", usually normal distribution of data).
- **The Bayesian master assumes no such thing**. The data are what they are. Based on this observed data (and a **prior** belief about the result), the Bayesian sampling algorithm returns a probability distribution (called **the posterior**) of the effect that is compatible with the observed data. For the correlation between *x* and *y*, it will return a distribution that says, for example, "the most probable effect is 0.42, but this data is also compatible with correlations of 0.12 or 0.74".
- To characterize our effects, **no need of *p* values** or other cryptic indices. We simply describe the posterior distribution of the effect. For example, we can report the median, the 90% *Credible* Interval or [other indices](https://easystats.github.io/bayestestR/articles/guidelines.html).


```{r echo=FALSE, fig.cap="Accurate depiction of a regular Bayesian user estimating a credible interval.", fig.align='center', out.width="50%"}
knitr::include_graphics("https://github.com/easystats/bayestestR/raw/master/man/figures/bayesianMaster.png")
```


*Note: Altough the very purpose of this package is to advocate for the use of Bayesian statistics, please note that there are serious arguments supporting frequentist indices (see for instance [this thread](https://discourse.datamethods.org/t/language-for-communicating-frequentist-results-about-treatment-effects/934/16)). As always, the world is not black and white (p \< .001).*

**So... how does it work?**

## A simple example

### BayestestR Installation

You can install `bayestestR` along with the whole [**easystats**](https://github.com/easystats/easystats) suite by running the following:

```{r eval=FALSE, message=FALSE, warning=FALSE, eval=FALSE}
install.packages("devtools")
devtools::install_github("easystats/easystats")
```

<!-- Now that it's done, we can *load* the packages we need, in this case [`report`](https://github.com/easystats/report), which combines `bayestestR` and other packages together. -->

<!-- ```{r message=FALSE, warning=FALSE} -->
<!-- library(report)  # Calling library(...) enables you to use the package's functions -->
<!-- ``` -->


Let's also install and load the [`rstanarm`](https://mc-stan.org/rstanarm/), used for fitting Bayesian models.

```{r message=FALSE, warning=FALSE, eval=FALSE}
install.packages("rstanarm")
library(rstanarm)
```



### Traditional linear regression

Let's fit a simple frequentist linear regression (the `lm()` function stands for *linear model*) between two numeric variables, `Sepal.Length` and `Petal.Length` from the famous [`iris`](https://en.wikipedia.org/wiki/Iris_flower_data_set) dataset, included by default in R.

```{r message=FALSE, warning=FALSE, eval=FALSE}
model <- lm(Sepal.Length ~ Petal.Length, data=iris)
summary(model)
```
```{r echo=FALSE, message=FALSE, warning=FALSE, comment=NA}
library(dplyr)

lm(Sepal.Length ~ Petal.Length, data=iris) %>% 
  summary()
```

<!-- ```{r message=FALSE, warning=FALSE, eval=FALSE} -->
<!-- model <- lm(Sepal.Length ~ Petal.Length, data=iris) -->
<!-- report(model) -->
<!-- ``` -->
<!-- ```{r echo=FALSE, message=FALSE, warning=FALSE, comment=NA} -->
<!-- library(dplyr) -->

<!-- lm(Sepal.Length ~ Petal.Length, data=iris) %>%  -->
<!--   report() %>%  -->
<!--   to_text(width=100) -->
<!-- ``` -->

This analysis suggests that there is a **significant** (*whatever that means*) and **positive** (with a coefficient of `0.41`) linear relationship between the two variables. 

*Fitting and interpreting **frequentist models is so easy** that it is obvious that people use it instead of the Bayesian framework... right?*

**Not anymore.**


### Bayesian linear regression

<!-- ```{r message=FALSE, warning=FALSE, eval=FALSE} -->
<!-- model <- stan_glm(Sepal.Length ~ Petal.Length, data=iris) -->
<!-- report(model) -->
<!-- ``` -->
<!-- ```{r echo=FALSE, message=FALSE, warning=FALSE, comment=NA, results='hide'} -->
<!-- library(rstanarm) -->
<!-- set.seed(333) -->

<!-- model <- stan_glm(Sepal.Length ~ Petal.Length, data=iris) -->
<!-- ``` -->
<!-- ```{r echo=FALSE, message=FALSE, warning=FALSE, comment=NA} -->
<!-- model %>%  -->
<!--   report() %>%  -->
<!--   to_text(width=100) -->
<!-- ``` -->

```{r message=FALSE, warning=FALSE, eval=FALSE}
model <- stan_glm(Sepal.Length ~ Petal.Length, data=iris)
summary(model)
```
```{r echo=FALSE, message=FALSE, warning=FALSE, comment=NA, results='hide'}
library(rstanarm)
set.seed(333)

model <- stan_glm(Sepal.Length ~ Petal.Length, data=iris)
```
```{r echo=FALSE, message=FALSE, warning=FALSE, comment=NA}
model %>% 
  summary()
```


**That's it!** You fitted a Bayesian version of the model by simply using [`stan_glm()`](https://mc-stan.org/rstanarm/reference/stan_glm.html) instead of `lm()`. The conclusion that we can drawn, for this example, are very similar. The effect (*the median of the effect's posterior distribution*) is about `0.41`, and it can be also be considered as *significant* in the Bayesian sense.

**So, ready to learn more?** Check out the [**next tutorial**](https://easystats.github.io/bayestestR/articles/example1_GLM.html)!

## References


back to top

Software Heritage — Copyright (C) 2015–2025, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Content policy— Contact— JavaScript license information— Web API