Content - debcb64f6a1eef48c6cd2d6cea014db5008eb43f - 64ee9cc/docs/introduction.rst

swh:1:snp:300923221fcf626df34df8c763b7994a14d9c907

Tip revision: 2b7948e2679a14455bd4e14cf1ac6a66d7640a7d authored by Jerome Kelleher on 22 February 2019, 11:57:19 UTC
Merge pull request #712 from mcveanlab/0.7.0-final

Tip revision: 2b7948e

introduction.rst

.. _sec_introduction:

============
Introduction
============

The primary goal of ``msprime`` is to efficiently and conveniently
generate coalescent trees for a sample under a range of evolutionary
scenarios. The library is a reimplementation of Hudson's seminal
``ms`` program, and aims to eventually reproduce all its functionality.
``msprime`` differs from ``ms`` in some important ways:

1. ``msprime`` is *much* more efficient than ``ms``, both in terms of
   memory usage and simulation time. In fact, ``msprime`` is also
   much more efficient than simulators based on approximations to the
   coalescent with recombination model, especially for simulations
   with very large sample sizes. ``msprime`` can easily simulate
   chromosome sized regions for hundreds of thousands of samples.

2. ``msprime`` is primarily designed to be used through its
   :ref:`Python API <sec_api>` to simplify the workflow associated with
   running and analysing simulations. (However, we do provide an
   ``ms`` compatible :ref:`command line interface <sec_cli>` to
   plug in to existing workflows.) For many simulations we first
   write a script to generate the command line parameters we
   want to run, then fork shell processes to run the simulations,
   and then parse the results to obtain the genealogies in a form
   we can use. With ``msprime`` all of this can be done directly
   in Python, which is both simpler and far more efficient.

3. ``msprime`` does not use Newick trees for interchange as they
   are extremely inefficient in terms of storage space and the
   time needed to generate and parse them. Instead, we use the
   `tskit library <https://tskit.readthedocs.io/en/stable/index.html>`_
   which allows us to store and process very large scale simulation
   results efficiently.