https://github.com/psobczyk/varclust
Raw File
Tip revision: c2b23b96a19c39bee8276f073e6109be11b325f0 authored by Piotr Sobczyk on 17 June 2021, 20:47:00 UTC
Merge pull request #10 from psobczyk/allowing-clusters-with-zero-pcs
Tip revision: c2b23b9
varclust.Rd
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/varclust.R
\docType{package}
\name{varclust}
\alias{varclust}
\title{Variable Clustering with Multiple Latent Components Clustering algorithm}
\description{
Package varclust performs clustering of variables, according to a
probabilistic model, which assumes that each cluster lies in a low
dimensional subspace. Segmentation of variables, number of clusters and their
dimensions are selected based on the appropriate implementation of the
Bayesian Information Criterion.
}
\details{
The best candidate models are identified by the specific implementation of
K-means algorithm, in which cluster centers are represented by some number of
orthogonal factors(principal components of the variables within a cluster)
and similarity between a given variable and a cluster center depends on
residuals from a linear model fit. Based on the Bayesian Information
Criterion (BIC), sums of squares of residuals are appropriately scaled, which
allows to avoid an over-excessive attraction by clusters with larger
dimensions. To reduce the chance that the local minimum of modified BIC
(mBIC) is obtained instead of the global one, for every fixed number of
clusters in a given range K-means algorithm is run large number of times,
with different random initializations of cluster centers.

The main function of package \pkg{varclust} is \code{\link{mlcc.bic}} which
allows clustering variables in a data with unknown number of clusters.
Variable partition is computed with k-means based algorithm. Number of
clusters and their dimensions are estimated using mBIC and PESEL
respectively. If the number of clusters is known one might use function
\code{\link{mlcc.reps}}, which takes number of clusters as a parameter. For
\code{\link{mlcc.reps}} one might specify as well some initial segmentation
for k-means algorithm. This can be useful if user has some a priori knowledge
about clustering.

We provide also two functions to simulate datasets with described structure.
The function \code{\link{data.simulation}} generates the data so that the
subspaces are indepentend and \code{\link{data.simulation.factors}} generates
the data where some factores are shared between the subspaces.

We also provide function measures of quality of clustering.
\code{\link{misclassification}} computes misclassification rate between two
partitions. This performance measure is extensively used in image
segmentation. The other measure is implemented as \code{\link{integration}}
function.

Version: 0.9.5
}
\examples{
\donttest{
sim.data <- data.simulation(n = 50, SNR = 1, K = 3, numb.vars = 50, max.dim = 3)
mlcc.bic(sim.data$X, numb.clusters = 1:5, numb.runs = 20, numb.cores = 1, verbose = TRUE)
mlcc.reps(sim.data$X, numb.clusters = 3, numb.runs = 20, numb.cores = 1)
}
}
\author{
Piotr Sobczyk, Stanislaw Wilczynski, Julie Josse, Malgorzata Bogdan

  Maintainer: Piotr Sobczyk \email{pj.sobczyk@gmail.com}
}
back to top