swh:1:snp:300923221fcf626df34df8c763b7994a14d9c907
Tip revision: 2b7948e2679a14455bd4e14cf1ac6a66d7640a7d authored by Jerome Kelleher on 22 February 2019, 11:57:19 UTC
Merge pull request #712 from mcveanlab/0.7.0-final
Merge pull request #712 from mcveanlab/0.7.0-final
Tip revision: 2b7948e
introduction.rst
.. _sec_introduction:
============
Introduction
============
The primary goal of ``msprime`` is to efficiently and conveniently
generate coalescent trees for a sample under a range of evolutionary
scenarios. The library is a reimplementation of Hudson's seminal
``ms`` program, and aims to eventually reproduce all its functionality.
``msprime`` differs from ``ms`` in some important ways:
1. ``msprime`` is *much* more efficient than ``ms``, both in terms of
memory usage and simulation time. In fact, ``msprime`` is also
much more efficient than simulators based on approximations to the
coalescent with recombination model, especially for simulations
with very large sample sizes. ``msprime`` can easily simulate
chromosome sized regions for hundreds of thousands of samples.
2. ``msprime`` is primarily designed to be used through its
:ref:`Python API <sec_api>` to simplify the workflow associated with
running and analysing simulations. (However, we do provide an
``ms`` compatible :ref:`command line interface <sec_cli>` to
plug in to existing workflows.) For many simulations we first
write a script to generate the command line parameters we
want to run, then fork shell processes to run the simulations,
and then parse the results to obtain the genealogies in a form
we can use. With ``msprime`` all of this can be done directly
in Python, which is both simpler and far more efficient.
3. ``msprime`` does not use Newick trees for interchange as they
are extremely inefficient in terms of storage space and the
time needed to generate and parse them. Instead, we use the
`tskit library <https://tskit.readthedocs.io/en/stable/index.html>`_
which allows us to store and process very large scale simulation
results efficiently.