https://github.com/STAR-Fusion/STAR-Fusion
Raw File
Tip revision: e718e9e7378f39ae0cda79506e9e23f7a587a38d authored by Brian Haas on 13 April 2015, 11:48:47 UTC
update to md
Tip revision: e718e9e
README.md
# STAR-Fusion 

STAR-Fusion further processes the output generated by the STAR aligner to map junction reads and spanning reads to a reference annotation set (using a GTF file, ideally the same annotation file used during the STAR genome index building process during the intial STAR setup).


STAR should be run using options that are well suited to fusion read detection.  An example of settings similar to those used in the landmark publication "The landscape of kinase fusions in cancer" (PMID: 25204415) by Stransky et al., Nat Commun 2014 are as follows:

```
   STAR --genomeDir Hg19.fa_star_index \
        --readFilesIn left.fq right.fq \
        --outSAMstrandField intronMotif \
        --outFilterIntronMotifs RemoveNoncanonicalUnannotated \
        --outReadsUnmapped None --chimSegmentMin 15 \
        --chimJunctionOverhangMin 15 \
        --alignMatesGapMax 200000 \
        --alignIntronMax 200000 \
        --outSAMtype BAM SortedByCoordinate 
```

The output from running star will include two primary output files that contain the junction and spanning read information (see STAR documentation for precise details).

      Chimeric.out.junction  : contains junction reads.
	  Chimeric.out.sam : contains alignments for fusion-spanning reads.


## Installation Requirements 


  STAR-Fusion requires the following Perl module from CPAN : Set/IntervalTree.pm
  found here:
  http://search.cpan.org/~benbooth/Set-IntervalTree-0.02/lib/Set/IntervalTree.pm

  A typical perl module installation may involve:
  perl -MCPAN -e shell
    install Set::IntervalTree 
 	
  *This CPAN module tends to install trouble-free on Linux.  Note, if you have trouble installing Set::IntervalTree on Mac OS X (as I did), try the following:  download the tarball from the above, run the perl Makefile.pl, then edit the generated 'Makefile' and remove all occurrences of '-arch i386'. Then try 'make', 'make test', and finally 'make install'.


## Running STAR-Fusion 

Run STAR-Fusion like so, using these two files above.  (Note, specify -G ref_annot.gtf if you choose to use a different annotation set than that included and used by default (gencode.v19 in the resources/ folder)

      STAR-Fusion -S Chimeric.out.sam -J Chimeric.out.junction


The output from STAR-Fusion is found as a tab-delimited file named 'star-fusion.fusion_candidates.txt', and has the following format:

```
 #fusion_name    JunctionReads   SpanningFrags   LeftGene        LeftBreakpoint  LeftDistFromRefExonSplice       RightGene       RightBreakpoint RightDistFromRefExonSplice
 FIP1L1--PDGFRA  98      13      FIP1L1^ENSG00000145216.11       chr4:54292132:+ 0       PDGFRA^ENSG00000134853.7        chr4:55141092:+ 84
 BRD4--NUTM1     7       2       BRD4^ENSG00000141867.13 chr19:15364963:-        0       NUTM1^ENSG00000184507.11        chr15:34640170:+        0
 EWSR1--FLI1     5       2       EWSR1^ENSG00000182944.13        chr22:29683123:+        0       FLI1^ENSG00000151702.12 chr11:128677075:+       0
 GOPC--ROS1      82      36      GOPC^ENSG00000047932.9  chr6:117888017:-        0       ROS1^ENSG00000047936.6  chr6:117642557:-        0
 ETV6--NTRK3     8       3       ETV6^ENSG00000139083.6  chr12:12022903:+        0       NTRK3^ENSG00000140538.12        chr15:88483984:-        0
 FGFR3--TACC3    221     372     FGFR3^ENSG00000068078.13        chr4:1808661:+  0       TACC3^ENSG00000013810.14        chr4:1729704:+  269
 EWSR1--ATF1     8       3       EWSR1^ENSG00000182944.13        chr22:29683123:+        0       ATF1^ENSG00000123268.4  chr12:51208063:+        0
 HOOK3--RET      9       2       HOOK3^ENSG00000168172.4 chr8:42823357:+ 0       RET^ENSG00000165731.13  chr10:43612032:+        0
 CD74--ROS1      5       0       CD74^ENSG00000019582.10 chr5:149784243:-        0       ROS1^ENSG00000047936.6  chr6:117645578:-        0
 TMPRSS2--ETV1   10      3       TMPRSS2^ENSG00000184012.7       chr21:42866302:-        19      ETV1^ENSG00000006468.9  chr7:13975463:- 58
 AKAP9--BRAF     4       4       AKAP9^ENSG00000127914.12        chr7:91632549:+ 0       BRAF^ENSG00000157764.8  chr7:140487384:-        0
 ...
```

Note, these fusion candidates are derived solely on mapping the STAR outputs to the reference annotations.  Paralogous genes are notorious for showing up as false-positive fusion candidates. Additional filtering tools, although not included now, will be made available soon.  Even without additional filtering, STAR-Fusion provides fusion detection accuracy that is on par with the very best available fusion predictors, and is one of the most efficient.


## Parameterization 

STAR-Fusion will report all candidates having at least 1 junction read where the breakpoints match up precisely with reference exon junctions of two different genes.

For those breakpoints that do not precisely match at reference exon junctions, the breakpoint fusion read support must be at least --min_novel_junction_support (default 10 reads).

In the case where multiple candidate fusion breakpoints are reported, only those breakpoints having at least --min_alt_pct_junction (default 10%) of the dominant isoform junction support will be reported.

Finally, it is worth noting that the counts of spanning fragments are entirely non-overlapping with the counts of the breakpoint junction reads. That is, no spanning fragment (from Chimeric.out.sam) is counted if it contains a read that is reported as evidence in the breakpoint junction candidate data (from Chimeric.out.junction).



## Example data and execution:

In the included test/ directory, you'll find a 'runMe.sh' script along with a data/ subdirectory.  The data/ subdirectory contains example fusion and spanning data generated from running STAR, in addition to a reference annotation file for gencode v19. Note, the reference GTF file contains only the 'exon' records instead of all lines from the original gencode annotation file; this speeds up parsing of the file and keeps the file size relatively small for including in this package.

In this test/ directory, Run the sample execution like so:

    ./runMe.sh

which simply runs:

    ../STAR-Fusion -S Chimeric.out.sam.gz -J Chimeric.out.junction.gz 

and you'll find the output file 'star-fusion.fusion_candidates.txt' containing the fusion candidates in the format described above.



######################
## Acknowledgements ##
######################

This effort was largely inspired by earlier work done by Nicolas Stransky and discussions with Daniel Nicorici.

STAR-Fusion is contributed by Brian Haas, Broad Institute, 2015

back to top