Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

Revision 7c9015bd11e2c7359f3311d9202aea6be69b4131 authored by Gabe DuBose on 10 June 2022, 00:10:28 UTC, committed by GitHub on 10 June 2022, 00:10:28 UTC
Update README.md
1 parent 7e56464
  • Files
  • Changes
  • f4ce391
  • /
  • README.md
Raw File Download
Permalinks

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • revision
  • directory
  • content
revision badge
swh:1:rev:7c9015bd11e2c7359f3311d9202aea6be69b4131
directory badge Iframe embedding
swh:1:dir:f4ce3911da38957801099bd5996c8095bc8f3b46
content badge Iframe embedding
swh:1:cnt:3e7c742644bb04f4079769fda976e65b17d1ac7c
Citations

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • revision
  • directory
  • content
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
README.md
# Evaluation of MUtations via reference Simulation: EMUS
 <p align="center"><img src="emus-logo.png" height="150" /></p>


NOTE: THIS PROGRAM IS STILL UNDER DEVELOPMENT AND NOT READY FOR GENERAL USE.

EMUS is a pipeline and tool for statistically evaluating the frequency of mutational classes in genomic studies. This is accomplished in two primary steps. First, random mutations are generated across your organisms reference genome. Second, the same number or mutations that were observed are randomly selected from the simualted mutations. The frequencies of mutational classes (i.e., synonymous, missense, intergenic, etc.) are then compared. This process is then repeated for n number of bootstraps, and the probability that the observed mutational classes orccured at higher or lower frequencies than randomly expected is computed. 

This documentation is a quick overview of the emus functionality. More examples and tutorials will be uploaded upon package completion.

## Installation
The simplest way to install EMUS is to setup the emus conda environment and then pip install this repository:
```
conda env create evo-informatics/emus
conda activate emus
pip3 install git+https://github.com/gabe-dubose/emus.git
```
Another option would be to download or clone this repository and use the emus-env.yaml file to create a conda environment with the dependencies:
```
git clone https://github.com/gabe-dubose/emus
cd emus
conda env create --file emus-env.yaml
pip3 install git+https://github.com/gabe-dubose/emus.git
```
Althogh not recomended, you can also install the dependencies manually and then clone this repository. It is recomended that these either be installed with pip and/or conda:

Dependencies:
  - Python3
  - SnpEff
  - Seaborn
  - Matplotlib

## General Workflow and Tutorial

### Simulating mutations in the reference genome
The simulate_mutations.py script takes an input genome in fasta format and simulates a flat number of SNPs. The output from this program is a variant call file (VCF) containing the number of mutations you specificed. However, if you are looking for more custom and fine-grained simulations, we recomend using Mutation-Simulator. The output VCF from this program is able to be incorporated in EMUS as well. 
```
simulate-mutations \
-i/--input    <reference_genome.fasta> \
-s/--snp      <#snps> \
-o/--output   <output_file.vcf>
```
NOTE: It is recommended to annotate these simulated variants using the same annotation tool that was used for the observed dataset.

### Optional: Visualizing simulation
EMUS offers a visualization feature for manually inspecting the distribution of mutations across each of your references chromosomes. The VCF generated by EMUS or Mutation-Simulator can be used as input. The output generated is an individual plot for each histogram, so we recomend making a separate directory to put these in.
```
mkdir output_dir
plot-vcf-histogram \
-i/--input    <simulated_mutations.vcf> \
-o/--output   <output_dir>
```

### Reading in variant annotations
All EMUS needs for downstream analyses is a .tsv file with a list of variants in the first column. Different variant annotation tools produce different output, so sometimes getting this information can be challenging. EMUS offers a little bit of help through the get-annotations command, which supports conversion from standard SnpEff, ANNOVAR, and VEP outputs. More functionality here can easily be added later on as well. If applicable, this step should be performed on the observed data and the simulated data.

```
get-annotations \
-i/--input <input_file> \
--snpeff OPTIONAL 
--annovar OPTIONAL 
--vep OPTIONAL
-o/--output <out_file.tsv>
```

### Comparing observed and simulated variants
Using the output from the get-annotations command in the previous step, we can compare our observed variants to our simulated ones. 
```
compare-variants \
-i/--input        <observed_variants.tsv> \
-c/--comparison   <simualted_variants.tsv>  \
-b/--bootstraps   <#bootstraps> \
-o/--output       <output_file_handle> \
```
This program will produce a plain .tsv file that will have the probability values for each mutation class. It will also produce a .bootstraps.tsv file that will contain the relevant information for visualization. 

### Visualizing comparisons
With the .boostraps.tsv file generated in the previous step, EMUS offers plotting options to generate publication quality figures. These visualizations include histograms, kernel density estimate (KDE) plots, emperical cumulative density estimate plots (ECDF), as well as options for figure customization and coloring.
```
mkdir out_dir
plot-variant-comparisons \
-i/--input          <data.bootstraps.tsv> \
-o/--outdir         <out_dir>  \
Optional Flags:
--hist
--kde
--ecdf
--color_tail        <color>
--comp_line_color   <color>
--plot_color        <color>
--blank_bars
--background_theme  <white or dark>
```

The diff you're trying to view is too large. Only the first 1000 changed files have been loaded.
Showing with 0 additions and 0 deletions (0 / 0 diffs computed)
swh spinner

Computing file changes ...

Software Heritage — Copyright (C) 2015–2025, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Contact— JavaScript license information— Web API

back to top