Revision ca9b808cf89fc5bfd73bfbc3a7aad705634c2260 authored by asardaes on 16 March 2022, 20:54:21 UTC, committed by asardaes on 16 March 2022, 20:54:21 UTC
1 parent 6382b57
Raw File
dtw_basic.Rd
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/DISTANCES-dtw-basic.R
\name{dtw_basic}
\alias{dtw_basic}
\title{Basic DTW distance}
\usage{
dtw_basic(
  x,
  y,
  window.size = NULL,
  norm = "L1",
  step.pattern = dtw::symmetric2,
  backtrack = FALSE,
  normalize = FALSE,
  sqrt.dist = TRUE,
  ...,
  error.check = TRUE
)
}
\arguments{
\item{x, y}{Time series. Multivariate series must have time spanning the rows and variables
spanning the columns.}

\item{window.size}{Size for slanted band window. \code{NULL} means no constraint.}

\item{norm}{Norm for the LCM calculation, "L1" for Manhattan or "L2" for (squared) Euclidean. See
notes.}

\item{step.pattern}{Step pattern for DTW. Only \code{symmetric1} or \code{symmetric2} supported here. Note
that these are \emph{not} characters. See \link[dtw:stepPattern]{dtw::stepPattern}.}

\item{backtrack}{Also compute the warping path between series? See details.}

\item{normalize}{Should the distance be normalized? Only supported for \code{symmetric2}.}

\item{sqrt.dist}{Only relevant for \code{norm = "L2"}, see notes.}

\item{...}{Currently ignored.}

\item{error.check}{Logical indicating whether the function should try to detect inconsistencies
and give more informative errors messages. Also used internally to avoid repeating checks.}
}
\value{
The DTW distance. For \code{backtrack} \code{=} \code{TRUE}, a list with:
\itemize{
\item \code{distance}: The DTW distance.
\item \code{index1}: \code{x} indices for the matched elements in the warping path.
\item \code{index2}: \code{y} indices for the matched elements in the warping path.
}
}
\description{
This is a custom implementation of the DTW algorithm without all the functionality included in
\code{\link[dtw:dtw]{dtw::dtw()}}. Because of that, it should be faster, while still supporting the most common
options.
}
\details{
If \code{backtrack} is \code{TRUE}, the mapping of indices between series is returned in a list.

The windowing constraint uses a centered window. The calculations expect a value in
\code{window.size} that represents the distance between the point considered and one of the edges
of the window. Therefore, if, for example, \code{window.size = 10}, the warping for an
observation \eqn{x_i} considers the points between \eqn{x_{i-10}} and \eqn{x_{i+10}}, resulting
in \code{10(2) + 1 = 21} observations falling within the window.
}
\note{
The elements of the local cost matrix are calculated by using either Manhattan or squared
Euclidean distance. This is determined by the \code{norm} parameter. When the squared Euclidean
version is used, the square root of the resulting DTW distance is calculated at the end (as
defined in Ratanamahatana and Keogh 2004; Lemire 2009; see vignette references). This can be
avoided by passing \code{FALSE} in \code{sqrt.dist}.

The DTW algorithm (and the functions that depend on it) might return different values in 32 bit
installations compared to 64 bit ones.

An infinite distance value indicates that the constraints could not be fulfilled, probably due to
a too small \code{window.size} or a very large length difference between the series.
}
\section{Proxy version}{


The version registered with \code{\link[proxy]{dist}} is custom (\code{loop = FALSE} in
\code{\link[proxy]{pr_DB}}). The custom function handles multi-threaded parallelization
directly (with \code{\link[RcppParallel:RcppParallel-package]{RcppParallel}}). It uses all
available threads by default (see
\code{\link[RcppParallel:defaultNumThreads]{RcppParallel::defaultNumThreads()}}), but this can
be changed by the user with
\code{\link[RcppParallel:setThreadOptions]{RcppParallel::setThreadOptions()}}.

An exception to the above is when it is called within a \code{\link[foreach:foreach]{foreach}}
parallel loop \strong{made by dtwclust}. If the parallel workers do not have the number of
threads explicitly specified, this function will default to 1 thread per worker. See the
parallelization vignette for more information (\code{browseVignettes("dtwclust")}).



It also includes symmetric optimizations to calculate only half a distance matrix when
appropriate---only one list of series should be provided in \code{x}. If you want to avoid this
optimization, call \code{\link[proxy]{dist}} by giving the same list of series in both \code{x}
and \code{y}.



In order for symmetry to apply here, the following must be true: no window constraint is used
(\code{window.size} is \code{NULL}) or, if one is used, all series have the same length.
}

\examples{
\dontrun{
# ====================================================================================
# Understanding multivariate DTW
# ====================================================================================

# The variables for each multivariate time series are:
# tip force, x velocity, and y velocity
A1 <- CharTrajMV[[1L]] # A character
B1 <- CharTrajMV[[6L]] # B character

# Let's extract univariate time series
A1_TipForce <- A1[,1L] # first variable (column)
A1_VelX <- A1[,2L] # second variable (column)
A1_VelY <- A1[,3L] # third variable (column)
B1_TipForce <- B1[,1L] # first variable (column)
B1_VelX <- B1[,2L] # second variable (column)
B1_VelY <- B1[,3L] # third variable (column)

# Looking at each variable independently:

# Just force
dtw_basic(A1_TipForce, B1_TipForce, norm = "L1", step.pattern = symmetric1)
# Corresponding LCM
proxy::dist(A1_TipForce, B1_TipForce, method = "L1")

# Just x velocity
dtw_basic(A1_VelX, B1_VelX, norm = "L1", step.pattern = symmetric1)
# Corresponding LCM
proxy::dist(A1_VelX, B1_VelX, method = "L1")

# Just y velocity
dtw_basic(A1_VelY, B1_VelY, norm = "L1", step.pattern = symmetric1)
# Corresponding LCM
proxy::dist(A1_VelY, B1_VelY, method = "L1")

# NOTES:
# In the previous examples there was one LCM for each *pair* of series.
# Additionally, each LCM has dimensions length(A1_*) x length(B1_*)

# proxy::dist won't return the LCM for multivariate series,
# but we can do it manually:
mv_lcm <- function(mvts1, mvts2) {
    # Notice how the number of variables (columns) doesn't come into play here
    num_obs1 <- nrow(mvts1)
    num_obs2 <- nrow(mvts2)

    lcm <- matrix(0, nrow = num_obs1, ncol = num_obs2)

    for (i in 1L:num_obs1) {
        for (j in 1L:num_obs2) {
            # L1 norm for ALL variables (columns).
            # Consideration: mvts1 and mvts2 MUST have the same number of variables
            lcm[i, j] <- sum(abs(mvts1[i,] - mvts2[j,]))
        }
    }

    # return
    lcm
}

# Let's say we start with only x velocity and y velocity for each character
mvts1 <- cbind(A1_VelX, A1_VelY)
mvts2 <- cbind(B1_VelX, B1_VelY)

# DTW distance
dtw_d <- dtw_basic(mvts1, mvts2, norm = "L1", step.pattern = symmetric1)
# Corresponding LCM
lcm <- mv_lcm(mvts1, mvts2) # still 178 x 174
# Sanity check
all.equal(
    dtw_d,
    dtw::dtw(lcm, step.pattern = symmetric1)$distance # supports LCM as input
)

# Now let's consider all variables for each character
mvts1 <- cbind(mvts1, A1_TipForce)
mvts2 <- cbind(mvts2, B1_TipForce)

# Notice how the next code is exactly the same as before,
# even though we have one extra variable now

# DTW distance
dtw_d <- dtw_basic(mvts1, mvts2, norm = "L1", step.pattern = symmetric1)
# Corresponding LCM
lcm <- mv_lcm(mvts1, mvts2) # still 178 x 174
# Sanity check
all.equal(
    dtw_d,
    dtw::dtw(lcm, step.pattern = symmetric1)$distance # supports LCM as input
)

# By putting things in a list,
# proxy::dist returns the *cross-distance matrix*, not the LCM
series_list <- list(mvts1, mvts2)
distmat <- proxy::dist(series_list, method = "dtw_basic",
                       norm = "L1", step.pattern = symmetric1)
# So this should be TRUE
all.equal(distmat[1L, 2L], dtw_d)

# NOTE: distmat is a 2 x 2 matrix, because there are 2 multivariate series.
# Each *cell* in distmat has a corresponding LCM (not returned by the function).
# Proof:
manual_distmat <- matrix(0, nrow = 2L, ncol = 2L)
for (i in 1L:nrow(manual_distmat)) {
    for (j in 1L:ncol(manual_distmat)) {
        lcm_cell <- mv_lcm(series_list[[i]], series_list[[j]]) # LCM for this pair
        manual_distmat[i, j] <- dtw::dtw(lcm_cell, step.pattern = symmetric1)$distance
    }
}
# TRUE
all.equal(
    as.matrix(distmat),
    manual_distmat
)
}
}
back to top