https://github.com/bcgsc/ntCard
Revision 2ff88f3b9c65375f8aeddae398dd99a453515d96 authored by Hamid Mohamadi on 04 November 2016, 17:10:37 UTC, committed by GitHub on 04 November 2016, 17:10:37 UTC
1 parent 1057d5e
Raw File
Tip revision: 2ff88f3b9c65375f8aeddae398dd99a453515d96 authored by Hamid Mohamadi on 04 November 2016, 17:10:37 UTC
Update LICENSE
Tip revision: 2ff88f3
README.md
ntCard 
=
ntCard is a streaming algorithm for cardinality estimation in genomics datasets. As iput it takes file(s) is fastq, fastq, sam, or bam formats and computes the total number of distinct k-mers, *F<sub>0</sub>*, and also the *k*-mer coverage frequency histogram, *f<sub>i</sub>*, *i>=1*.  

# Build the binary for ntcard
Run:
```
$ make
```
# Run ntcard
```
ntcard [OPTIONS] ... [FILE]
```
Parameters:
  * `-k`,  `--kmer=SIZE`: the length of *k*-mer `[64]`
  * `-t`,  `--threads=N`: use N parallel threads `[1]`
  * `-c`,  `--cov=N`: the maximum coverage of *k*-mer in output `[64]`
  * `FILE`: input file or set of files seperated by space, in fasta, fastq, sam, and bam formats. The files can also be in compressed (`.gz`, `.bz2`, `.xz`) formats . A list of files containing file names in each row can be passed with `@` prefix.
  
For example to run ntcard on a test file `reads.fastq` with `k=50`:
```
$ ntcard -k50 reads.fastq 
```
back to top