[PyDLM](https://pydlm.github.io/) [![wwrechard](https://circleci.com/gh/wwrechard/pydlm.svg?style=svg)](https://app.circleci.com/pipelines/github/wwrechard/pydlm) [![Coverage Status](https://coveralls.io/repos/github/wwrechard/pydlm/badge.svg?branch=master)](https://coveralls.io/github/wwrechard/pydlm?branch=master) ======================================================= Welcome to [pydlm](https://pydlm.github.io/), a flexible time series modeling library for python. This library is based on the Bayesian dynamic linear model (Harrison and West, 1999) and optimized for fast model fitting and inference. Updates ------------------------------------------- * Version 0.1.1.12 has been released on PyPI. * Added support and CI for Python 3.8, 3.9, 3.10, 3.11 * Migrated CI from Travis to CircleCI * Setup Github workflow to release based on version tag * Github dev updates. * Migrated all the unnecessary `print()` to the default python logging operations, such as `logging.info`, `logging.warning` and `logging.critical`. * Users can now set the model logging level to suppress unnecessary information during model run. * Example to only print warning information: ```python ... my_model.setLoggingLevel('WARNING') my_model.fit() ``` * Updated the package version manage through `pip-compile` and `requirements.txt`. Installation ------------ You can get the package (current version 0.1.1.11) from `pypi` by $ pip install pydlm You can also get the latest from [github](https://github.com/wwrechard/PyDLM) $ git clone git@github.com:wwrechard/pydlm.git pydlm $ cd pydlm $ pip install pip-tools $ pip install -r requirements.txt $ pip install -e . --no-deps `pydlm` depends on the following modules, * `numpy` (for core functionality) * `matplotlib` (for plotting results) * `Sphinx` (for generating documentation) * `unittest` (for testing) Google data science post example ----------------- We use the example from the [Google data science post](http://www.unofficialgoogledatascience.com/2017/07/fitting-bayesian-structural-time-series.html) as an example to show how `pydlm` could be used to analyze the real world data. The code and data is placed under `examples/unemployment_insurance/...`. The dataset contains weekly counts of initial claims for unemployment during 2004 - 2012 and is available from the R package `bsts` (which is a popular R package for time series modeling). The raw data is shown below (left)
We see strong annual pattern and some local trend from the data.
Most of the time series shape is attributed to the local linear trend and the strong seasonality pattern is easily seen. To further verify the performance, we use this simple model for long-term forecasting. In particular, we use the previous **351 week**'s data to forecast the next **200 weeks** and the previous **251 week**'s data to forecast the next **200 weeks**. We lay the predicted results on top of the real data ```python # Plot the prediction give the first 351 weeks and forcast the next 200 weeks. simple_dlm.plotPredictN(date=350, N=200) # Plot the prediction give the first 251 weeks and forcast the next 200 weeks. simple_dlm.plotPredictN(date=250, N=200) ```
From the figure we see that after the crisis peak around 2008 - 2009 (Week 280), the simple model can accurately forecast the next 200 weeks (left figure) given the first 351 weeks. However, the model fails to capture the change near the peak if the forecasting start before Week 280 (right figure).
The one-day ahead prediction looks much better than the simple model, particularly around the crisis peak. The mean prediction error is **0.099** which is a 100% improvement over the simple model. Similarly, we also decompose the time series into the three components ```python drm.turnOff('predict plot') drm.turnOff('filtered plot') drm.plot('linear_trend') drm.plot('seasonal52') drm.plot('regressor10') ``` This time, the shape of the time series is mostly attributed to the regressor and the linear trend looks more linear. If we do long-term forecasting again, i.e., use the previous **301 week**'s data to forecast the next **150 weeks** and the previous **251 week**'s data to forecast the next **200 weeks** ```python drm.plotPredictN(date=300, N=150) drm.plotPredictN(date=250, N=200) ```
The results look much better compared to the simple model Documentation ------------- Detailed documentation is provided in [PyDLM](https://pydlm.github.io/) with special attention to the [User manual](https://pydlm.github.io/#dynamic-linear-models-user-manual).