Revision 026e000bcce27f092950447031dab7835a7916c4 authored by asardaes on 31 December 2015, 16:46:12 UTC, committed by asardaes on 31 December 2015, 16:46:12 UTC
1 parent 9e875be
Raw File
README.Rmd
---
output:
     md_document:
          variant: markdown_github
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r setOptions, cache = FALSE, echo = FALSE, warning = FALSE, message = FALSE}
library(knitr)
library(dtwclust)

knitr::opts_chunk$set(
     collapse = TRUE,
     comment = "#>",
     fig.path = "README-"
)
```

# Time Series Clustering With Dynamic Time Warping Distance (DTW)

This package attempts to consolidate some of the recent techniques related to time series clustering under DTW and implement them in `R`. Most of these algorithms make use of traditional clustering techniques (partitional and hierarchical clustering) but change the distance definition. In this case, the distance between time series is measured with DTW.

DTW is, however, computationally expensive, so several optimization techniques exist. They mostly deal with bounding the DTW distance. These bounds are only defined for time series of equal lengths. Nevertheless, if the length of the time series of interest vary only slightly, reinterpolating them to a common length is probably appropriate.

Additionally, a recently proposed algorithm called k-Shape could serve as an alternative. k-Shape clustering relies on custom distance and centroid definitions, which are unrelated to DTW. The shape extraction algorithm proposed therein is particularly interesting if time series can be z-normalized.

Many of the algorithms and optimizations require that all series have the same length. The ones that don't are usually slow but can still be used.

Please see the included references for more information.

## Implementations

* Keogh's and Lemire's lower bounds
* DTW Barycenter Averaging
* k-Shape clustering
* TADPole clustering

## Examples

```{r examples}
## Load data
data(uciCT)

## Reinterpolate data to equal lengths
datalist <- zscore(CharTraj)
data <- lapply(CharTraj, reinterpolate, newLength = 180)

## Common controls
ctrl <- list(window.size = 20L, trace = TRUE)

#### Using DTW with help of lower bounds and PAM centroids
ctrl$pam.precompute <- FALSE

kc.dtwlb <- dtwclust(data = data, k = 20, distance = "dtw_lb",
                     centroid = "pam", seed = 3247, 
                     control = ctrl)

plot(kc.dtwlb)

ctrl$pam.precompute <- TRUE

#### Hierarchical clustering based on shape-based distance
hc.sbd <- dtwclust(datalist, type = "hierarchical",
                   k = 20, distance = "sbd",
                   method = "all",
                   control = ctrl)

cat("Rand index for HC+SBD:\n")
print(ri <- sapply(hc.sbd, randIndex, y = CharTrajLabels))

plot(hc.sbd[[which.max(ri)]])

#### TADPole clustering
kc.tadp <- dtwclust(data, type = "tadpole", k = 20,
                    dc = 1.5, control = ctrl)

plot(kc.tadp, clus = 1:4)

#### Parallel support
require(doParallel)
cl <- makeCluster(detectCores(), "FORK")
invisible(clusterEvalQ(cl, library(dtwclust)))
registerDoParallel(cl)

## Registering a custom distance with proxy and using it (normalized DTW)
ndtw <- function(x, y, ...) {
     dtw::dtw(x, y, step.pattern = symmetric2,
              distance.only = TRUE, ...)$normalizedDistance
}

## Registering the function with 'proxy'
proxy::pr_DB$set_entry(FUN = ndtw, names=c("nDTW"),
                       loop = TRUE, type = "metric", distance = TRUE,
                       description = "Normalized DTW with L1 norm")

## Data with different lengths
kc.ndtw <- dtwclust(datalist, k = 20,
                    distance = "nDTW", centroid = "pam",
                    seed = 159, control = new("dtwclustControl", nrep = 8L))

sapply(kc.ndtw, randIndex, y = CharTrajLabels)

## DBA centroids
kc <- dtwclust(datalist, k = 20,
               distance = "nDTW", centroid = "dba",
               seed = 9421, control = list(trace = TRUE))

# Modifying some plot parameters
plot(kc, labs.arg = list(title = "DBA Centroids", x = "time", y = "series"))

stopCluster(cl)
registerDoSEQ()
```

## Dependencies

* Partitional procedures are inspired by the `flexclust` package.
* Hierarchical procedures use the native `hclust` function.
* Cross-distances make use of the `proxy` package.
* The core DTW calculations are done by the `dtw` package.
* Plotting is done with the `ggplot2` package.
* Parallel computation depends on the `foreach` package.
* Random streams for repetitions of partitional procedures use the `doRNG` package.
back to top