https://github.com/bcgsc/ntCard
Revision 1057d5e3a6c9f34d5f58efe6479da945e1ae3189 authored by Hamid Mohamadi on 03 November 2016, 21:21:14 UTC, committed by GitHub on 03 November 2016, 21:21:14 UTC
1 parent feb151e
Raw File
Tip revision: 1057d5e3a6c9f34d5f58efe6479da945e1ae3189 authored by Hamid Mohamadi on 03 November 2016, 21:21:14 UTC
Delete COPYRIGHT
Tip revision: 1057d5e
README.md
ntCard 
=
ntCard is a streaming algorithm for cardinality estimation in genomics datasets. As iput it takes file(s) is fastq, fastq, sam, or bam formats and computes the total number of distinct k-mers, *F<sub>0</sub>*, and also the *k*-mer coverage frequency histogram, *f<sub>i</sub>*, *i>=1*.  

# Build the binary for ntcard
Run:
```
$ make
```
# Run ntcard
```
ntcard [OPTIONS] ... [FILE]
```
Parameters:
  * `-k`,  `--kmer=SIZE`: the length of *k*-mer `[64]`
  * `-t`,  `--threads=N`: use N parallel threads `[1]`
  * `-c`,  `--cov=N`: the maximum coverage of *k*-mer in output `[64]`
  * `FILE`: input file or set of files seperated by space, in fasta, fastq, sam, and bam formats. The files can also be in compressed (`.gz`, `.bz2`, `.xz`) formats . A list of files containing file names in each row can be passed with `@` prefix.
  
For example to run ntcard on a test file `reads.fastq` with `k=50`:
```
$ ntcard -k50 reads.fastq 
```
back to top