https://github.com/cran/multivariance
Raw File
Tip revision: 223488fe47429eb3067dc3455d2e2852fe694fbc authored by Björn Böttcher on 06 October 2021, 14:50:05 UTC
version 2.4.1
Tip revision: 223488f
dependence.structure.Rd
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/multivariance-functions.R
\name{dependence.structure}
\alias{dependence.structure}
\title{determines the dependence structure}
\usage{
dependence.structure(
  x,
  vec = 1:ncol(x),
  verbose = TRUE,
  detection.aim = NULL,
  type = "conservative",
  structure.type = "clustered",
  c.factor = 2,
  list.cdm = NULL,
  alpha = 0.05,
  p.adjust.method = "holm",
  stop.too.many = NULL,
  ...
)
}
\arguments{
\item{x}{matrix, each row of the matrix is treated as one sample}

\item{vec}{vector, it indicates which columns are initially treated together as one sample}

\item{verbose}{boolean, if \code{TRUE} details are printed during the detection and whenever a cluster is newly detected the (so far) detected dependence structure is plotted.}

\item{detection.aim}{\code{=NULL} or a list of vectors which indicate the expected detection, see below for more details}

\item{type}{the method used for the detection, one of '\code{conservative}','\code{resample}','\code{pearson_approx}' or '\code{consistent}'}

\item{structure.type}{either the '\code{clustered}' or the '\code{full}' structure is detected}

\item{c.factor}{numeric, larger than 0, a constant factor used in the case of '\code{type = "consistent"}'}

\item{list.cdm}{not required, the list of doubly centered distance matrices corresponding to \code{x} speeds up the computation if given}

\item{alpha}{numeric between 0 and 1, the significance level used for the tests}

\item{p.adjust.method}{a string indicating the p-value adjustment for multiple testing, see \code{\link{p.adjust.methods}}}

\item{stop.too.many}{numeric, upper limit for the number of tested tuples. A warning is issued if it is used. Use \code{stop.too.many = NULL} for no limit.}

\item{...}{these are passed to \code{\link{find.cluster}}}
}
\value{
returns a list with elements:
\describe{
  \item{\code{multivariances}}{calculated multivariances,}
  \item{\code{cdms}}{calculated doubly centered distance matrices,}
  \item{\code{graph}}{graph representing the dependence structure,}
  \item{\code{detected}}{boolean, this is only included if a \code{detection.aim} is given,}
  \item{\code{number.of.dep.tuples}}{vector, with the number of dependent tuples for each tested order. For the full dependence structure a value of -1 indicates that all tuples of this order are already lower order dependent, a value of -2 indicates that there were more than \code{stop.too.many} tuples,}
  \item{\code{structure.type}}{either \code{clustered} or \code{full},}
  \item{\code{type}}{the type of p-value estimation or consistent estimation used,}
  \item{\code{total.number.of.tests}}{numeric vector, with the number of tests for each group of tests,}
  \item{\code{typeI.error.prob}}{estimated probability of a type I error,}
  \item{\code{alpha}}{significance level used if a p-value estimation procedure is used,}
  \item{\code{c.factor}}{factor used if a consistent estimation procedure is used,}
  \item{\code{parameter.range}}{significance levels (or 'c.factor' values) which yield the same detection result.}
}
}
\description{
Determines the dependence structure as described in [3].
}
\details{
Performs the detection of the dependence structure as described in [3]. In the \code{clustered} structure variables are clustered and treated as one variable as soon as a dependence is detected, the \code{full} structure treats always each variable separately. The detection is either based on tests with significance level \code{alpha} or a \code{consistent} estimator is used. The latter yields (in the limit for increasing sample size) under very mild conditions always the correct dependence structure (but the convergence might be very slow).

If \code{fixed.rejection.level} is not provided, the significance level \code{alpha} is used to determine which multivariances are significant using the distribution-free rejection level. As default the Holm method is used for p-value correction corresponding to multiple testing.

The resulting graph can be simplified (pairwise dependence can be represented by edges instead of vertices) using \code{\link{clean.graph}}.

Advanced:
The argument \code{detection.aim} is currently only implemented for \code{structure.type = clustered}. It can be used to check, if an expected dependence structure was detected. This might be useful for simulation studies to determine the empirical power of the detection algorithm. Hereto  \code{detection.aim} is set to a list of vectors which indicate the expected detected dependence structures (one for each run of \code{\link{find.cluster}}). The vector has as first element the \code{k} for which k-tuples are detected (for this aim the detection stops without success if no k-tuple is found), and the other elements, indicate to which clusters all present vertices belong after the detection, e.g. \code{c(3,2,2,1,2,1,1,2,1)} expects that 3-tuples are detected and in the graph are 8 vertices (including those representing the detected 3 dependencies), the order of the 2's and 1's indicate which vertices belong to which cluster. If \code{detection.aim} is provided, the vector representing the actual detection is printed, thus one can use the output with copy-paste to fix successively the expected detection aims.

Note that a failed detection might invoke the warning:
\preformatted{
run$mem == detection.aim[[k]][-1] :
longer object length is not a multiple of shorter object length
}
}
\examples{
\donttest{
# structures for the datasets included in the package
dependence.structure(dep_struct_several_26_100)
dependence.structure(dep_struct_star_9_100)
dependence.structure(dep_struct_iterated_13_100)
dependence.structure(dep_struct_ring_15_100)

# basic examples:

x = coins(100) # 3-dependent
dependence.structure(x)

colnames(x) = c("A","B","C")
dependence.structure(x) # names of variables are used as labels

dependence.structure(coins(100),vec = c(1,1,2))
# 3-dependent rv of which the first two rv are used together as one rv, thus 2-dependence.

dependence.structure(x,vec = c(1,1,2)) # names of variables are used as labels


dependence.structure(cbind(coins(200),coins(200,k=5)),verbose = TRUE)
#1,2,3 are 3-dependent, 4,..,9 are 6-dependent

# similar to the the previous example, but
# the pair 1,3 is treated as one sample,
# anagously the pair 2,4. In the resulting structure one does not
# see anymore that the dependence of 1,2,3,4 with the rest is due
# to 4.
dependence.structure(cbind(coins(200),coins(200,k=5)),
                           vec = c(1,2,1,2,3,4,5,6,7),verbose = TRUE)


### Advanced:

# How to check the empirical power of the detection algorithm?
# Use a dataset for which the structure is detected, e.g. dep_struct_several_26_100.
# run:
dependence.structure(dep_struct_several_26_100,
                     detection.aim = list(c(ncol(dep_struct_several_26_100))))
# The output provides the first detection aim. Now we run the same line with the added
# detection aim
dependence.structure(dep_struct_several_26_100,detection.aim = list(c(3,1, 1, 1, 2, 2, 2, 3, 4,
  5, 6, 7, 8, 8, 8, 9, 9, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1, 2, 8, 9),
  c(ncol(dep_struct_several_26_100))))
# and get the next detection aim ... thus we finally obtain all detection aims.
# now we can run the code with new sample data ....
N = 100
dependence.structure(cbind(coins(N,2),tetrahedron(N),coins(N,4),tetrahedron(N),
                           tetrahedron(N),coins(N,3),coins(N,3),rnorm(N)),
                     detection.aim = list(c(3,1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8, 8, 8,
  9, 9, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1, 2, 8, 9),
  c(4,1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11,
    11, 12, 1, 2, 8, 9, 10, 11),
  c(5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 1,
    2, 4, 5, 6, 7, 3),
  c(5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 1,
    2, 4, 5, 6, 7, 3)))$detected
# ... and one could start to store the results and compute the rate of successes.

# ... or one could try to check how many samples are necessary for the detection:
re = numeric(100)
for (i in 2:100) {
  re[i] =
    dependence.structure(dep_struct_several_26_100[1:i,],verbose = FALSE,
                         detection.aim = list(c(3,1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8,
      8, 8, 9, 9, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1, 2, 8, 9),
      c(4,1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 10, 11, 11,
        11, 11, 12, 1, 2, 8, 9, 10, 11),
      c(5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7,
        8, 1, 2, 4, 5, 6, 7, 3),
      c(5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7,
        8, 1, 2, 4, 5, 6, 7, 3)))$detected
  print(paste("First", i,"samples. Detected?", re[i]==1))
}
cat(paste("Given the 1 to k'th row the structure is not detected for k =",which(re == FALSE),"\n"))
}
}
\references{
For the theoretic background see the reference [3] given on the main help page of this package: \link{multivariance-package}.
}
back to top