Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

https://github.com/gabe-dubose/emus
27 October 2023, 08:23:24 UTC
  • Code
  • Branches (1)
  • Releases (0)
  • Visits
    • Branches
    • Releases
    • HEAD
    • refs/heads/main
    No releases to show
  • f4ce391
  • /
  • README.md
Raw File Download
Take a new snapshot of a software origin

If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.

Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.
swh spinner

Processing "take a new snapshot" request ...

Permalinks

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • content
  • directory
  • revision
  • snapshot
origin badgecontent badge Iframe embedding
swh:1:cnt:3e7c742644bb04f4079769fda976e65b17d1ac7c
origin badgedirectory badge Iframe embedding
swh:1:dir:f4ce3911da38957801099bd5996c8095bc8f3b46
origin badgerevision badge
swh:1:rev:7c9015bd11e2c7359f3311d9202aea6be69b4131
origin badgesnapshot badge
swh:1:snp:2bd7ed1081ff598939e948409013744f1715835b
Citations

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • content
  • directory
  • revision
  • snapshot
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Tip revision: 7c9015bd11e2c7359f3311d9202aea6be69b4131 authored by Gabe DuBose on 10 June 2022, 00:10:28 UTC
Update README.md
Tip revision: 7c9015b
README.md
# Evaluation of MUtations via reference Simulation: EMUS
 <p align="center"><img src="emus-logo.png" height="150" /></p>


NOTE: THIS PROGRAM IS STILL UNDER DEVELOPMENT AND NOT READY FOR GENERAL USE.

EMUS is a pipeline and tool for statistically evaluating the frequency of mutational classes in genomic studies. This is accomplished in two primary steps. First, random mutations are generated across your organisms reference genome. Second, the same number or mutations that were observed are randomly selected from the simualted mutations. The frequencies of mutational classes (i.e., synonymous, missense, intergenic, etc.) are then compared. This process is then repeated for n number of bootstraps, and the probability that the observed mutational classes orccured at higher or lower frequencies than randomly expected is computed. 

This documentation is a quick overview of the emus functionality. More examples and tutorials will be uploaded upon package completion.

## Installation
The simplest way to install EMUS is to setup the emus conda environment and then pip install this repository:
```
conda env create evo-informatics/emus
conda activate emus
pip3 install git+https://github.com/gabe-dubose/emus.git
```
Another option would be to download or clone this repository and use the emus-env.yaml file to create a conda environment with the dependencies:
```
git clone https://github.com/gabe-dubose/emus
cd emus
conda env create --file emus-env.yaml
pip3 install git+https://github.com/gabe-dubose/emus.git
```
Althogh not recomended, you can also install the dependencies manually and then clone this repository. It is recomended that these either be installed with pip and/or conda:

Dependencies:
  - Python3
  - SnpEff
  - Seaborn
  - Matplotlib

## General Workflow and Tutorial

### Simulating mutations in the reference genome
The simulate_mutations.py script takes an input genome in fasta format and simulates a flat number of SNPs. The output from this program is a variant call file (VCF) containing the number of mutations you specificed. However, if you are looking for more custom and fine-grained simulations, we recomend using Mutation-Simulator. The output VCF from this program is able to be incorporated in EMUS as well. 
```
simulate-mutations \
-i/--input    <reference_genome.fasta> \
-s/--snp      <#snps> \
-o/--output   <output_file.vcf>
```
NOTE: It is recommended to annotate these simulated variants using the same annotation tool that was used for the observed dataset.

### Optional: Visualizing simulation
EMUS offers a visualization feature for manually inspecting the distribution of mutations across each of your references chromosomes. The VCF generated by EMUS or Mutation-Simulator can be used as input. The output generated is an individual plot for each histogram, so we recomend making a separate directory to put these in.
```
mkdir output_dir
plot-vcf-histogram \
-i/--input    <simulated_mutations.vcf> \
-o/--output   <output_dir>
```

### Reading in variant annotations
All EMUS needs for downstream analyses is a .tsv file with a list of variants in the first column. Different variant annotation tools produce different output, so sometimes getting this information can be challenging. EMUS offers a little bit of help through the get-annotations command, which supports conversion from standard SnpEff, ANNOVAR, and VEP outputs. More functionality here can easily be added later on as well. If applicable, this step should be performed on the observed data and the simulated data.

```
get-annotations \
-i/--input <input_file> \
--snpeff OPTIONAL 
--annovar OPTIONAL 
--vep OPTIONAL
-o/--output <out_file.tsv>
```

### Comparing observed and simulated variants
Using the output from the get-annotations command in the previous step, we can compare our observed variants to our simulated ones. 
```
compare-variants \
-i/--input        <observed_variants.tsv> \
-c/--comparison   <simualted_variants.tsv>  \
-b/--bootstraps   <#bootstraps> \
-o/--output       <output_file_handle> \
```
This program will produce a plain .tsv file that will have the probability values for each mutation class. It will also produce a .bootstraps.tsv file that will contain the relevant information for visualization. 

### Visualizing comparisons
With the .boostraps.tsv file generated in the previous step, EMUS offers plotting options to generate publication quality figures. These visualizations include histograms, kernel density estimate (KDE) plots, emperical cumulative density estimate plots (ECDF), as well as options for figure customization and coloring.
```
mkdir out_dir
plot-variant-comparisons \
-i/--input          <data.bootstraps.tsv> \
-o/--outdir         <out_dir>  \
Optional Flags:
--hist
--kde
--ecdf
--color_tail        <color>
--comp_line_color   <color>
--plot_color        <color>
--blank_bars
--background_theme  <white or dark>
```

Software Heritage — Copyright (C) 2015–2025, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Contact— JavaScript license information— Web API

back to top