https://github.com/RabadanLab/TOBI
Tip revision: 47e9f4e80ebd11f03f5f202f55e9e5c13e510e9f authored by gitliver on 29 May 2019, 16:26:36 UTC
add MIT License
add MIT License
Tip revision: 47e9f4e
readme.txt
#########################################################################################
TTTTTTTTTTTTTTTTTTTTTTT OOOOOOOOO BBBBBBBBBBBBBBBBB IIIIIIIIII
T:::::::::::::::::::::T OO:::::::::OO B::::::::::::::::B I::::::::I
T:::::::::::::::::::::T OO:::::::::::::OO B::::::BBBBBB:::::B I::::::::I
T:::::TT:::::::TT:::::T O:::::::OOO:::::::O BB:::::B B:::::B II::::::II
TTTTTT T:::::T TTTTTT O::::::O O::::::O B::::B B:::::B I::::I
T:::::T O:::::O O:::::O B::::B B:::::B I::::I
T:::::T O:::::O O:::::O B::::BBBBBB:::::B I::::I
T:::::T O:::::O O:::::O B:::::::::::::BB I::::I
T:::::T O:::::O O:::::O B::::BBBBBB:::::B I::::I
T:::::T O:::::O O:::::O B::::B B:::::B I::::I
T:::::T O:::::O O:::::O B::::B B:::::B I::::I
T:::::T O::::::O O::::::O B::::B B:::::B I::::I
TT:::::::TT O:::::::OOO:::::::O BB:::::BBBBBB::::::B II::::::II
T:::::::::T OO:::::::::::::OO B:::::::::::::::::B I::::::::I
T:::::::::T OO:::::::::OO B::::::::::::::::B I::::::::I
TTTTTTTTTTT OOOOOOOOO BBBBBBBBBBBBBBBBB IIIIIIIIII
TOBI: Tumor Only Boosting Identification of Driver Mutations
Tumor-Only Boosting Identification (TOBI) is a framework for unified germline and somatic
analysis analysis using largely tumor-only samples. TOBI uses gradient booosting to learn
features of confirmed somatic variants from a small training set of tumor-normal sampless,
then generates a classification model that identifies variants with somatic characteristics
in tumor-only samples.
First, WES files from tumor samples undergo variant calling, annotation, and filtering for
quality. TOBI then merges variants across multiple samples. In pre-processing, TOBI labels
variants from the training set as somatic (“som”) or non-somatic (“non_som"). Finally, in
the machine learning step, TOBI generates and applies a somatic classifier.
Ver. 1.2: April 12, 2016
cjmadubata & tchu modified from Alireza Roshan Ghias's code
(Ver. 1.1: Nov 07, 2014 https://github.com/alireza202/TOBI.git TOBI)
dependencies:
- Python 2.7.11
- Perl v5.10.1
- R v3.1.2
- Java 1.7.0_25
- samtools 0.1.19
- bcftools 0.1.19
- VCFtools v0.1.10.1
- snpEff v3.6 & dbNSFP (https://sites.google.com/site/jpopgen/dbNSFP)
- snpSift v3.6
#########################################################################################
###varCall_filtering###
inputs at each step:
V (variant calling): indexed .bam files in a folder. Files must have .bam extension
and filename cannot start with a number.
A (annotation): .vcf files in a folder. Files must have .vcf extension and filename
cannot start with a number. If starting from this step, please format vcf to
match bcftools output.
F (filter): .vcf files in a folder. Files must have .vcf extension and filename
cannot start with a number.
usage: TOBIvaf.py [-h] [--inputdir INPUTDIR] [--output OUTPUT]
[--config CONFIG] [--steps STEPS] [--cluster {hpc,amazon}]
[--debug] [--cleanup] [--ref REF] [--start START]
[--end END] [--snpeff SNPEFF] [--annovcf ANNOVCF]
[--dbnsfp DBNSFP] [--vcftype {default,TCGA}]
[--mergename MERGENAME]
TOBIv1.2: Tumor Only Boosting Identification of Driver Mutations All arguments
can be specified in a config file. (See included varCall.config file as an
example).
Arguments:
General Arguments:
-h, --help show this help message and exit
--inputdir INPUTDIR [REQUIRED] directory for bam/vcf files.
--output OUTPUT [REQUIRED] output directory.
--config CONFIG config file specifying command line arguments.
Arguments specified in the command line overwrite
config file arguments.
--steps STEPS [REQUIRED] Specify which steps of pipeline to run. V:
variant calling A: annotate F: filter M: merge eg.
--steps AF
--cluster {hpc,amazon}
[REQUIRED] Specify which cluster to run on. hpc: run
on an SGE hpc cluster amazon: CURRENTLY UNIMPLEMENTED
--debug Debug/verbose flag. Default: False
--cleanup Delete temporary debug files. Default True
VCF Step Arguments:
--ref REF [REQUIRED - VCF] Reference genome file.
--start START Start index used for testing. Will not work in config.
Default 1
--end END End index used for testing. Will not work in config.
Default 74
Annotation Step Arguments:
--snpeff SNPEFF [REQUIRED - ANNOTATE] Directory where snpEff is
--annovcf ANNOVCF [REQUIRED - ANNOTATE] A comma separated list of .vcf
files to annotate with.
--dbnsfp DBNSFP [REQUIRED - ANNOTATE] Path to dbNSFP file
Filter Step Arguments:
--vcftype {default,TCGA}
Specifies vcf type specically for TCGA filtering
Merge Step Arguments:
--mergename MERGENAME
[REQUIRED - MERGE] Name for final merged file
#########################################################################################
### machine_learning ###
Step 8. Pre-processing using R. Needs customization each time.
usage: TOBIml.py [-h] [--input INPUT] [--output OUTPUT] [--somatic SOMATIC]
[--log LOG] [--check_missed CHECK_MISSED] [--suffix SUFFIX]
[--vcftype {default,TCGA}] [--train_size TRAIN_SIZE]
[--verbose]
{preprocess,machinelearning}
TOBIv1.2: Tumor Only Boosting Identification of Driver Mutations. Machine
learning step.
positional arguments:
{preprocess,machinelearning}
preprocess: preprocessing step;
machinelearning: machine learning step
optional arguments:
-h, --help show this help message and exit
--input INPUT [REQUIRED] input file
--output OUTPUT [REQUIRED] output file for PP, output folder for ML
--somatic SOMATIC [REQUIRED] formatted file containing somatic variants
--log LOG Optional argument to specify a log to pipe stdout and
stderr to
--check_missed CHECK_MISSED
[PP ARG] checking which mutations in important genes
are missed by filtering
--suffix SUFFIX [ML ARG] a label specific to this particular run (e.g.
<date>_<disease>)
--vcftype {default,TCGA}
Specifies vcf type specically for TCGA filtering
--train_size TRAIN_SIZE
[ML ARG] number of patients you want in the training
set.
--verbose verbose flag