# tripseq-analysis
Analysis tools for TrIP-seq data.  There are various individual tools inside to perform different tasks.  Limited documentation below, contact with questions. 


Calculate the properties of an input transcriptome (or regions thereof). Input format is BED, output files are .csv files with various properties as specified on the command line.

#### usage
usage: [-h] -i INPUT -g GENOME [--gc] [--length]
                                   [--exonct] [--nt NT] [-o OUTPUT]
                                   [--window WINDOW]
                                   [--convtorefseq CONVTOREFSEQ]
                                   [--targetscanfile TARGETSCANFILE]
                                   [--deltag] [--lfold] [--cap-structure]
                                   [--kozak] [--uorf-count] [--uorf-overlap]
                                   [--start-codon] [--rare-codons]
                                   [--mirna-sites] [--au-elements]

optional arguments:
  -h, --help            show this help message and exit

Global arguments:
  -i INPUT, --input INPUT
                        The transcriptome region (BED format)
  -g GENOME, --genome GENOME
                        The genome for the input transcriptome
  --gc                  Calculate GC content
  --length              Calculate length
  --exonct              Count # of exons
  --nt NT               Number of threads (default is 8 or 4 for lfold)
  -o OUTPUT, --output OUTPUT
                        Output basename (e.g. CDS)
  --window WINDOW       Window size for sliding window calculations (default
  --convtorefseq CONVTOREFSEQ
                        Filename to convert input annotations to refseq (for
                        targetscan; e.g. knownToRefSeq.txt)
  --targetscanfile TARGETSCANFILE
                        Filename of targetscan scores (e.g.
  --deltag              Calculate min deltaG in sliding window of size
                        --window over region
  --lfold               Use RNALfold to calculate MFE rather than RNAfold
                        (faster but does not compute centroid,MEA)

5' UTR specific arguments:
  --cap-structure       Calculate structure at the 5' end

Start-codon-specific arguments:
  --kozak               Calculate Kozak context score
  --uorf-count          Calculate number of 5' UTR uORFs (starting with
  --uorf-overlap        Overlap of uORF with start codon (implies --uorf-
  --start-codon         Record the start codon used (ATG or other)

CDS-specific arguments:
  --rare-codons         Calculate codon usage properties

3' UTR specific arguments:
  --mirna-sites         Compile miRNA binding site info from targetscan
  --au-elements         Count number of AU-rich elements in the 3' UTR

* ViennaRNA RNAfold and RNALfold (
* HumanCodonTable (this page)
* AnnotationConverter (this page)
* TargetscanScores (this page) 
* SNFUtils (this page) 


Take two lists of TrIP-seq data (i.e. clusters) and compare them for genes that have the transcripts in each of the two different sets.  For each set of gene-linked transcript isoforms, compare input transcriptome features as calculated using 

#### Usage

usage: [-h] --set1 FNAME ID ... [FNAME ID ... ...]
                                 --set2 FNAME ID ... [FNAME ID ... ...]
                                 --tx-to-gene TX_TO_GENE [-o OUTPUT] -n NREP
                                 [--txome-props TXOME_PROPS [TXOME_PROPS ...]]
                                 [--control] --txome-gtf TXOME_GTF

optional arguments:
  -h, --help            show this help message and exit
  --set1 FNAME ID ... [FNAME ID ... ...]
                        Files and IDs containing transcript distributions;
                        compare between set1 and set2
  --set2 FNAME ID ... [FNAME ID ... ...]
                        Files and IDs containing transcript distributions;
                        compare between set1 and set2
  --tx-to-gene TX_TO_GENE
                        Mapping between transcript ID in input file and gene
  -o OUTPUT, --output OUTPUT
                        Output filename (default is stdout)
  -n NREP, --nrep NREP  Number of replicates of each point
  --txome-props TXOME_PROPS [TXOME_PROPS ...]
                        List of files with transcriptome properties to
                        correlate among (wildcards ok)
  --control             Perform randomized comparisons of input transcripts as
                        a control.
  --txome-gtf TXOME_GTF
                        Path to transcriptome GTF
#### Requirements

* (this page)
* (this page)
* (this page) 
* Two lists of transcripts to compare (i.e. clusters) 
* Lists of transcriptome properties to compare between transcript isoforms of the same gene in the two sets (generated by 
* File containing transcript ID to gene mapping


Plot an individual transcript or all transcripts of a gene.  Requires input polysome sequencing data (i.e. TrIPseq) or some other distribution. 

Input "tx-to-gene" file should be a file containing four columns: txid, geneid, gene_name, tx_name. This can be downloaded from Ensembl Biomart or other sources.

#### Usage
usage: [-h] -i INPUT [-o OUTPUT] -n NREP --id ID
                                  --tx-to-gene TX_TO_GENE [--text]
                                  [--format FORMAT]

Plot input transcript ID from input distribution file

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        File containing transcript distributions
  -o OUTPUT, --output OUTPUT
                        Output filename (default is stdout)
  -n NREP, --nrep NREP  Number of replicates of each point
  --id ID               Transcript ID(s) to print (can be partial; can be
                        comma-separated list)
  --tx-to-gene TX_TO_GENE
                        File containing transcript ID to gene name mapping
  --text                Output text data in addition to plots.
  --format FORMAT       Image format to export (png or pdf).
#### Requirements

* (this page) 
* Input per-transcript distributions
* File containing transcript ID to gene mapping (if per-gene plotting is desired) 

Converts between FPKM and TPM (transcripts per million).  Uses the formula TPM_i = FPKM_i * 1e6 / sum(FPKM_g for all genes g)

#### Usage
usage: [-h] -i INPUT [-t SEPARATOR] [-o [OUTPUT]]
                      [--ignore IGNORE] [--filter FILTER] [-u]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        File containing identifiers to use for the merge
  -t SEPARATOR, --separator SEPARATOR
                        Field separator (default comma; "tab" for tabs;
                        "space" for whitespace
  -o [OUTPUT], --output [OUTPUT]
                        File to output to (default stdout)
  --ignore IGNORE       Number of columns to ignore (one-based; 1 ignores the
                        first column)
  --filter FILTER       Filter genes with TPM below arg
  -u, --unique          Only output lines with unique entries in column 1
#### Requirements 
* A file with FPKM values to convert to TPM 

## Utility classes


A class to provide for conversion between two annotation sets. 


A class to read GTF files - downloaded from and minimally modified 


A file providing various utility functions.


A class harboring information on human codon usage.


A class to read in targetscan scores and provide accessor functions. 


A class defining a transcript and structural features associated with it. 

