https://github.com/bxlab/metaWRAP
Raw File
Tip revision: 9b82461a1da8175e8d5cfdbfc5231ff0a9bcb876 authored by German Uritskiy on 30 December 2017, 05:36:02 UTC
added conda isntall
Tip revision: 9b82461
README.md
# MetaWRAP - Wrapper for Metagenomic Bin Analysis
## MetaWRAP v=0.6

 MetaWRAP aims to be an easy-to-use inclusive wrapper program that accomplishes the most basic tasks in metagenomic analysis: QC, assembly, binning, visualization, and taxonomic profiling. While there is no single best approach for processing metagenomic data, metaWRAP is meant to be a fast and simple first pass program before you delve deeper into parameterization of your approach. Each individual component of the pipeline is also a standalone module. This modularity allows the users to use only the modules they are interested in. 
 
![General walkthrough of metaWRAP modules](https://i.imgur.com/LcC09ym.png)
   
 In addition to being a tool wrapper, MetaWRAP offers a innovative hybrid pipeline for extracting high-quality draft genomes (bins) from metagenomic data. By using a variety of software (metaBAT2, CONCOCT, MaxBin2) and utilizing their individual strengths and minimizing their weaknesses, the [bin refinement module](https://i.imgur.com/JL665Qo.png) will always produce stronger results than individual approaches. Additionally, due to its diverse binning approach, this pipeline shows promise to produce robust binning results in a variety of microbial communities. 

 MetaWRAP also includes a [bin reassembly module](https://i.imgur.com/GUSMXl8.png), which allows to drastically improve the quality of a set of bins by extracting the reads belonging to that draft genome, and reassembling it with a more permissive, non-metagenomic assembler. In addition to improving the N50 of the bins, this modestly increases the compleiton of the bins, and drastically reduces contamination.
  
 If you already have your metagenomic data assembled and binned with two or more software (or the same software with different parameters), try using the BIN_REFINEMENT and REASSEMBLE_BINS modules to see how you can further improve your bin predictions! 
  

## OVERVIEW OF METAWRAP MODULES:
  
#### Metagemonic data pre-processing modules:
		1) Read QC (trimming and human read removal)
    	2) Assembly (with metaSPAdes or MegaHit, plust assembly QC)
		3) Kraken (taxonomy profiling and visualization)
    	4) Binning (MaxBin2, metaBAT2, CONCOCT)
	
#### Bin processing modules:
		1) Bin refinement and consolidation of multiple bin sets
		2) Bin reassembly (reassemble bins to improve completiona and reduce contamination)
		3) Bin quantitation (bin abundance estimation across samples)
    	5) Blobology (visualize bin success with blobplots)
		6) Classify bins (asign taxonomy to draft genomes)

##  SYSTEM REQUIREMENTS
 The resource requirements for this pipeline will vary greatly based on the amount of data being processed, but due to large memory requirements of many software used (KRAKEN and metaSPAdes to name a few), I would advise against attempting to run it on anything less than 10 cores and 100GB RAM. MetaWRAP officially supports only Linux x64 systems.


## INSTALLATION
 To start, download [miniconda2](https://conda.io/miniconda.html) (the Python 2.7 version) and install it. This will make installing all dependancies of metaWRAP much easier. Once you have conda installed, you can install metawrap and all its dependancies with the following command:
 ``` bash
 conda install -c ursky metawrap-binning
 ```
 If everything went well, running the following command should result in a help message
 ``` bash
 metaWRAP read_qc -h
 ```
 
 Note: The above conda installation will install over 140 software dependancies. If you already actively use conda, it may be wise to [set up a custom environment in conda](https://conda.io/docs/user-guide/tasks/manage-environments.html) for metaWRAP and install it only in there, so that your current environment and that of metaWRAP dont conflict with each other.

## DATABASES

 Finally, use your favorite text editor to configure paths to databases in miniconda2/bin/config-metawrap and make sure all the paths look correct. This is very important if you want to use databases (see Database section below). If you are unsure where this config file is, run:
 ``` bash
 which config-metawrap
 ```

You will need to [download and configure several databases](https://github.com/ursky/metaWRAP/blob/master/installation/database_installation.md) and adjust their paths in the config-metawrap file. Note that depending on what modules you plan on using, you may not need all the databases.

|    Database     | Size  |  Used in module |
|:---------------:|:---------------:|:-----:| 
|Checkm_DB	 |1.4GB| binning, bin_refinement, reassemble_bins |
|KRAKEN standard database|161GB |  kraken |
| NCBI_nt |71GB |  blobology |
| NCBI_tax |283MB |  blobology |
|Indexed hg38  	|  20GB |  read_qc |


## DETAILED PIPELINE WALKTHROUGH

  ![Detailed pipeline walkthrough](https://i.imgur.com/5bb6vlY.jpg)


## USAGE

Once all the dependencies are in place, running metaWRAP is relatively simple. The main metaWRAP script wraps around all of its individual modules, which you can call independently.
```
metaWRAP -h
	Usage: metaWRAP [module] --help
	Options:

	read_qc		Raw read QC module
	assembly	Assembly module
	binning		Binning module
	bin_refinement	Refinement of bins from binning module
	reassemble_bins Reassemble bins using metagenomic reads
	quant_bins	Quantify the abundance of each bin across samples
	blobology	Blobology module
	kraken		KRAKEN module
```

Each module is run separately. For example, to run the assembly module:
```
metaWRAP assembly -h

Usage: metaWRAP assembly [options] -1 reads_1.fastq -2 reads_2.fastq -o output_dir
Options:

	-1 STR          forward fastq reads
	-2 STR          reverse fastq reads
	-o STR          output directory
	-m INT          memory in GB (default=10)
	-t INT          number of threads (defualt=1)

	--use-megahit		assemble with megahit (default)
	--use-metaspades	assemble with metaspades instead of megahit
```

### Acknowledgements
Author of pipeline: German Uritskiy.

Principal Investigators: [James Taylor](http://bio.jhu.edu/directory/james-taylor/) and [Jocelyne DiRuggiero](http://bio.jhu.edu/directory/jocelyne-diruggiero/)

Institution: Johns Hopkins, [Department of Cell, Molecular, Developmental Biology, and Biophysics](http://cmdb.jhu.edu/) 

I do not claim to have any authorship of the many programs this pipeline uses. For questions, bugs, and suggestions, contact me at guritsk1@jhu.edu, or leave a comment on this github page.

back to top