https://github.com/bcgsc/ntCard
Revision feb151ed444ebc517188d9f4094cb851c5e7dc3a authored by Hamid Mohamadi on 03 November 2016, 20:52:23 UTC, committed by GitHub on 03 November 2016, 20:52:23 UTC
1 parent d6a5b67
Raw File
Tip revision: feb151ed444ebc517188d9f4094cb851c5e7dc3a authored by Hamid Mohamadi on 03 November 2016, 20:52:23 UTC
Update README.md
Tip revision: feb151e
README.md
ntCard 
=
ntCard is a streaming algorithm for cardinality estimation in genomics datasets. As iput it takes file(s) is fastq, fastq, sam, or bam formats and computes the total number of distinct k-mers, *F<sub>0</sub>*, and also the *k*-mer coverage frequency histogram, *f<sub>i</sub>*, *i>=1*.  

# Build the binary for ntcard
Run:
```
$ make
```
# Run ntcard
```
ntcard [OPTIONS] ... [FILE]
```
Parameters:
  * `-k`,  `--kmer=SIZE`: the length of *k*-mer `[64]`
  * `-t`,  `--threads=N`: use N parallel threads `[1]`
  * `-c`,  `--cov=N`: the maximum coverage of *k*-mer in output `[64]`
  * `FILE`: input file or set of files seperated by space, in fasta, fastq, sam, and bam formats. The files can also be in compressed (`.gz`, `.bz2`, `.xz`) formats . A list of files containing file names in each row can be passed with `@` prefix.
  
For example to run ntcard on a test file `reads.fastq` with `k=50`:
```
$ ntcard -k50 reads.fastq 
```
back to top