Content - 5964698ee1925f7fddc8b90d931cb7511146d6aa - 284405c/README.rst

visit type:

Tip revision: 37d56e54446977a7d98a9c6b24f14b856e48abb6 authored by Duncan Murdock on 10 August 2018, 03:08:27 UTC
increment version (again)

Tip revision: 37d56e5

README.rst

SIDR - Sequence Identification with Decision tRees
==================================================

.. image:: https://travis-ci.org/damurdock/SIDR.svg?branch=master
    :target: https://travis-ci.org/damurdock/SIDR

SIDR (pronounced: cider) is a tool to filter Next Generation Sequencing
(NGS) data based on a chosen target organism. SIDR uses data fron BLAST
(or similar classifiers) to train a decision tree model to classify
sequence data as either belonging to the target organism, or belonging
to something else. This classification can be used to filter the data
for later assembly.

Note: SIDR is alpha software. Features are currently incomplete and subject to major change.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Installation
------------

To install SIDR, clone this repository and run setup.py, or use pip to install.

::

    pip install sidr

See the `documentation <https://sidr.readthedocs.io>`__ for more
details.

Usage
-----

SIDR has two main modes. Default mode takes several bioinformatics files
as input, and computes a decision tree based on percentage GC content
and per-base sequencing coverage. To run it, use:

::

    sidr default -d [taxdump path] -b [bamfile] -f [assembly FASTA] -r [BLAST results] -k tokeep.contigids -x toremove.contigids -t [target phylum] 

Runfile mode takes a tab-delimited file of contigs, variables, and
classification as input. To run it, use:

::

    sidr runfile -i [runfile] -k tokeep.contigids -x toremove.contigids -t [target phylum] 

See the `documentation <https://sidr.readthedocs.io>`__ for more
details.

TODO
----

-  More complete documentation

-  More unit tests

Browse the archive

https://github.com/damurdock/SIDR