https://gitlab.com/rahmannlab/hackgap
Raw File
Tip revision: 624ccdecc9700714393c4b4edfaa517d520c7cce authored by Jens Zentgraf on 23 May 2022, 14:40:56 UTC
add default value for k in debug
Tip revision: 624ccde
README.md
# DESCRIPTION

hackgap (**ha**sh-based **c**ounting of **k**-mers with **gap**s) provides a fast jit-compiled *k*-kmer counter which supports gapped *k*-mers.


# INSTALLATION

## Install the conda package manager (miniconda)
Go to https://docs.conda.io/en/latest/miniconda.html and download the Miniconda installer:
Choose Python 3.9 (or higher), your operating system, and preferably the 64 bit version.
Follow the instructions of the installer and append the conda executable to your PATH (even if the installer does not recommend it).
You can let the installer do it, or do it manually by editing your ``.bashrc`` or similar file under Linux or MacOS, or by editing the environment varialbes under Windows.
To verify that the installation works, open a new terminal and execute
```
conda --version # ideally 4.12.xx or higher
python --version # ideally 3.9.xx or higher
```

## Obtain or update hackgap
Our software is currently obtained by cloning this public git repository:
```
git clone https://gitlab.com/rahmannlab/hackgap.git
```
We may release a bioconda package later.

If you need to update hackgap later, you can do so by just executing
```
git pull
```
within the cloned directory tree.


## Create and activate a conda environment
To run our software, a [conda](https://docs.conda.io/en/latest/) environment with the required libraries needs to be created.
A list of needed libraries is provided in the ``environment.yml`` file in the cloned repository;
it can be used to create a new environment:
```
cd hackgap  # the directory of the cloned repository
conda env create
```
which will create an environment named ``hackgap`` with the required dependencies,
using the provided ``environment.yml`` file in the same directory.

To activate the newly created environment run
```
conda activate hackgap
```

## Install hackgap
To install hackgap we use the package installer for Python pip.

Run the following command to install hackgap.
```
pip install -e .
```

To check if the installation was a succes exectue
```
hackgap -v # should be 0.11.0 or higher
```

# Example

Here we will provide a small example how to run hackgap on the t2t reference.

## Download reference genome

First we need to download the t2t reference (https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0.fa.gz)

```
mkdir data # create data folder
cd data
wget https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0.fa.gz
cd ..
```

## Run hackgap
To execute hackgap we need to provide:
- `-n`: the expected number of distinct *k*-mers
- `-k`: for contiguous or `-mask` for gapped: the *k*-mer shape
- `--fasta`: the uncompressed input file using `pigz` or `zcat`
- `-o`: the output file in `zarr` format

```
hackgap count -n 2391456540 -k 25 --fasta <(pigz -cd -p 2 data/chm13v2.0.fa.gz) -o t2t-k25.zarr
```

```
hackgap count -n 2416328905 --mask "####_####_###_###_###_####_####" --fasta <(pigz -cd -p 2 data/chm13v2.0.fa.gz) -o t2t-m2.zarr
```
back to top