Raw File
Tip revision: 733c727bd4b9f7c1a7f4508b9a467b2f31cacf33 authored by changmt on 01 May 2017, 15:06:09 UTC
Updated Google link to MAF
Tip revision: 733c727

#Identifying recurrent mutations in cancer[<img target="_blank" align="right" border="0" alt="" src="" width="125" height="30">](    <img align="right" border="0" alt="" src="" width="30" height="30">

### Software and dataset

#### Description: 
This is a method to identify population-scale recurrent mutations in cancer based on a binomial
statisical model that incoporates underlying mutational processes including nucleotide context
mutability, gene-specific mutation rates, and major expected patterns of hotspot mutation emergence

#### Dependencies:
Need R Version 3.0.2 or higher
Install dependent packages (**data.table**, **IRanges**, **BSgenome.Hsapiens.UCSC.hg19**) as follows:


    --input-maf=[REQUIRED: mutation file]
    --rdata=[REQUIRED: Rdata object with necessary files for algorithm]
    --output-file=[REQUIRED: output file to print statistically significant hotspots]
    --gene-query=[OPTIONAL (default=all genes in mutation file): List of Hugo Symbol in which to query for hotspots]
    --homopolymer=[OPTIONAL (default=TRUE): TRUE|FALSE filter hotspot mutations in homopolymer regions]
    --filter-centerbias=[OPTIONAL (default=FALSE): TRUE|FALSE to identify false positive filtering based on mutation calling center bias]
    --align100mer=[OPTIONAL: BED file of hg19 UCSC alignability track for 100-mer length sequences for false positive filtering]
    --align24mer=[OPTIONAL: BED file of hg19 UCSC alignability track for 24-mer length sequences for false positive filtering]
Command to run hotspot algorithm on genes listed in file genes_of_interest.txt:
./hotspot_algo.R \
	--input-maf=pancancer_unfiltered.maf \
	--rdata=hotspot_algo.Rdata \
	--gene-query=genes_of_interest.txt \

**[ Required ]** `hotspot_algo.R` - R script to execute hotspot detection algorithm

**[ Required ]** `hotspot_algo.Rdata` - Rdata object with necessary files for algorithm (mutability, expression filters, etc)

**[ Required ]** `funcs.R` - R script of functions necessary for proper execution of hotspot_algo.R

`genes_of_interest.txt` - Sample list of genes for hotspot detection

`minimalist_test_maf.txt` - minimalist MAF needed from maf2maf. [mskcc/maf2maf](

`--align100mer` and `--align24mer` are optional filters based on how uniquely k-mer sequences align to a region of the hg19 genome. Note, both filters were used as part of this analysis. See more information at [ENCODE Mapability](

The use of these filters will require downloading the 100-mer and 24-mer alignability tracks from UCSC that are not included here:
	[ENCODE CRG Alignability 100-mer](
	[ENCODE CRG Alignability 24-mer](

Convert these downloaded bigWig to bedgraph format, following instructions here: [UCSC BigWig](
back to top