https://bitbucket.org/yonatanf/sparcc
Raw File
Tip revision: fce551e261dc70c673462d260b8d0616760fcf42 authored by Yonatan Friedman on 02 December 2018, 19:50:19 UTC
DOC: add license info to readme
Tip revision: fce551e
readme.rst
``SparCC`` is a python module for computing correlations in compositional data (16S, metagenomics, etc').

Detailed information about the algorithm can be found in the accompanying `publication <http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002687>`__.  

Questions, comments, complaints and praise should be send to yonatanf@mit.edu



********************************
Usage Notes:
********************************
- Scripts in the root SparCC directory can be called from the terminal command-line either by explicitly calling python (as is done in the usage examples below), or simply as an executable. The latter will require having execution permission for these file (e.g. chmod +x SparCC.py).

- Help for any one for the scripts in the root SparCC directory is available by typing 'python [script_name] - h' in the command line. e.g.: :: 

   python SparCC.py -h .

- SparCC is implemented in pure python and requires a working version of python (=>2.3, tested with 2.6.6) and numpy (tested with version 1.10.1).

       
********************************
Usage example:
********************************
- The following lists the commands required for analyzing the included 'fake' dataset using the SparCC package, and generating all the files present in the subfolders of the example folder.

- The fake dataset contains simulated abundances of 50 otus in 200 samples, drawn at random from a multinomial log-normal distribution. The true basis correlations used to generate the data are listed in 'true_basis_cor.txt' in the example folder.

- Note that otu 0 is very dominant, and thus, using Pearson or Spearman correlations, appears to be negatively correlated with most other OTUs, though it is in fact not negatively correlated with any OTU.

---------------------------------
Correlation Calculation:
---------------------------------
First, we'll quantify the correlation between all OTUs, using SparCC, Pearson, and Spearman correlations:

::

   python SparCC.py example/fake_data.txt -i 5 --cor_file=example/basis_corr/cor_sparcc.out
   python SparCC.py example/fake_data.txt -i 5 --cor_file=example/basis_corr/cor_pearson.out -a pearson
   python SparCC.py example/fake_data.txt -i 5 --cor_file=example/basis_corr/cor_spearman.out -a spearman


---------------------------------
Pseudo p-value Calculation:
---------------------------------
Calculating pseudo p-values is done via a bootstrap procedure.
First make shuffled (w. replacement) datasets:
::

   python MakeBootstraps.py example/fake_data.txt -n 5 -t permutation_#.txt -p example/pvals/

This will generate 5 shuffled datasets, which is clearly not enough to get meaningful p-values, and is used here for convenience.
A more appropriate number of shuffles should be at least a 100, which is the default value. 

Next, you'll have to run SparCC on each of the shuffled data sets. 
Make sure to use the exact same parameters which you used when running SparCC on the real data, name all the output files consistently, numbered sequentially, and with a '.txt' extension.
::

   python SparCC.py example/pvals/permutation_0.txt -i 5 --cor_file=example/pvals/perm_cor_0.txt
   python SparCC.py example/pvals/permutation_1.txt -i 5 --cor_file=example/pvals/perm_cor_1.txt
   python SparCC.py example/pvals/permutation_2.txt -i 5 --cor_file=example/pvals/perm_cor_2.txt
   python SparCC.py example/pvals/permutation_3.txt -i 5 --cor_file=example/pvals/perm_cor_3.txt
   python SparCC.py example/pvals/permutation_4.txt -i 5 --cor_file=example/pvals/perm_cor_4.txt

Above I'm simply called SparCC 5 separate times. However, it is much more efficient and convenient to write a small script that automates this, and submits these runs as separate jobs to a cluster (if one is available to you. Otherwise, this may take a while to run on a local machine...).

Now that we have all the correlations computed from the shuffled datasets, we're ready to get the pseudo p-values.
Remember to make sure all the correlation files are in the same folder, are numbered sequentially, and have a '.txt' extension.
The following will compute both one and two sided p-values.
::

   python PseudoPvals.py example/basis_corr/cor_sparcc.out example/pvals/perm_cor_#.txt 5 -o example/pvals/pvals.one_sided.txt -t one_sided
   python PseudoPvals.py example/basis_corr/cor_sparcc.out example/pvals/perm_cor_#.txt 5 -o example/pvals/pvals.one_sided.txt -t two_sided



LICENSE
===================

The MIT License (MIT)

Copyright (c) 2018-2020 Jonathan Friedman and Eric Alm

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


back to top