% Generated by roxygen2 (4.0.1): do not edit by hand \name{sjc.qclus} \alias{sjc.qclus} \title{Compute quick cluster analysis} \usage{ sjc.qclus(data, groupcount = NULL, groups = NULL, method = "k", distance = "euclidean", agglomeration = "ward", iter.max = 20, algorithm = "Hartigan-Wong", showAccuracy = FALSE, title = NULL, titleSize = 1.3, titleColor = "black", axisLabels.x = NULL, axisLabelAngle.x = 0, axisLabelSize = 1.1, axisLabelColor = "gray30", axisTitle.x = "Cluster group characteristics", axisTitle.y = "Mean of z-scores", axisTitleColor = "black", axisTitleSize = 1.3, breakTitleAt = 40, breakLabelsAt = 12, breakLegendTitleAt = 20, breakLegendLabelsAt = 20, facetCluster = FALSE, barColor = NULL, barAlpha = 1, colorPalette = "GnBu", barWidth = 0.5, barSpace = 0.1, barOutline = FALSE, barOutlineSize = 0.2, barOutlineColor = "black", theme = NULL, borderColor = NULL, axisColor = NULL, hideLegend = FALSE, showTickMarks = TRUE, showAxisLabels.x = TRUE, showAxisLabels.y = TRUE, showGroupCount = TRUE, showAccuracyLabels = FALSE, legendTitle = NULL, legendLabels = NULL, legendPos = "right", legendSize = 1, legendBorderColor = "white", legendBackColor = "white", majorGridColor = NULL, minorGridColor = NULL, hideGrid.x = FALSE, hideGrid.y = FALSE, flipCoordinates = FALSE, reverseAxis.x = FALSE, printPlot = TRUE) } \arguments{ \item{data}{The data frame containing all variables that should be used for the cluster analysis.} \item{groupcount}{The amount of groups (clusters) that should be retrieved. May also be a set of initial (distinct) cluster centres, in case \code{method} is \code{"kmeans"} (see \code{\link{kmeans}} for details on \code{centers} parameter). By default (\code{NULL}), the optimal amount of clusters is calculated using the gap statistics (see \code{\link{sjc.kgap}}. However, this works only with kmeans as \code{method}. If \code{method} is \code{"hclust"}, you have to specify a groupcount. Use the \code{\link{sjc.elbow}}-function to determine the group-count depending on the elbow-criterion. Use \code{\link{sjc.grpdisc}}-function to inspect the goodness of grouping.} \item{groups}{By default, this parameter is \code{NULL} and will be ignored. However, if you just want to plot an already existing cluster solution without computing a new cluster analysis, specifiy \code{groupcount} and \code{group}. \code{group} is a vector of same length as \code{nrow(data)} and indicates the group classification of the cluster analysis. The group classification can be computed with the \code{\link{sjc.cluster}} function.} \item{method}{The method for computing the cluster analysis. By default (\code{"kmeans"}), a kmeans cluster analysis will be computed. Use \code{"hclust"} to compute a hierarchical cluster analysis. You can specify the initial letters only.} \item{distance}{The distance measure to be used when \code{"method"} is \code{"hclust"} (for hierarchical clustering). This must be one of \code{"euclidean"}, \code{"maximum"}, \code{"manhattan"}, \code{"canberra"}, \code{"binary"} or \code{"minkowski"}. See \code{\link{dist}}. By default, method is \code{"kmeans"} and this parameter will be ignored.} \item{agglomeration}{The agglomeration method to be used when \code{"method"} is \code{"hclust"} (for hierarchical clustering). This should be one of \code{"ward"}, \code{"single"}, \code{"complete"}, \code{"average"}, \code{"mcquitty"}, \code{"median"} or \code{"centroid"}. Default is \code{"ward"}. See \code{\link{hclust}}. By default, method is \code{"kmeans"} and this parameter will be ignored.} \item{iter.max}{the maximum number of iterations allowed. Only applies, if \code{method} is \code{"kmeans"}. See \code{\link{kmeans}} for details on this parameter.} \item{algorithm}{algorithm used for calculating kmeans cluster. Only applies, if \code{method} is \code{"kmeans"}. May be one of \code{"Hartigan-Wong"} (default), \code{"Lloyd"} (used by SPSS), or \code{"MacQueen"}. See \code{\link{kmeans}} for details on this parameter.} \item{showAccuracy}{If \code{TRUE}, the \code{\link{sjc.grpdisc}} function will be called, which computes a linear discriminant analysis on the classified cluster groups and plots a bar graph indicating the goodness of classification for each group.} \item{title}{Title of diagram as string. Example: \code{title=c("my title")}} \item{titleSize}{The size of the plot title. Default is 1.3.} \item{titleColor}{The color of the plot title. Default is \code{"black"}.} \item{axisLabels.x}{Labels for the x-axis breaks. Example: \code{axisLabels.x=c("Label1", "Label2", "Label3")}. Note: If you use the \code{\link{sji.SPSS}} function and the \code{\link{sji.getValueLabels}} function, you receive a list object with label string. The labels may also be passed as list object. They will be unlisted and converted to character vector automatically.} \item{axisLabelAngle.x}{Angle for axis-labels.} \item{axisLabelSize}{The size of axis labels of both x and y axis. Default is 1.1, recommended values range between 0.5 and 3.0.} \item{axisLabelColor}{User defined color for axis labels. If not specified, a default dark gray color palette will be used for the labels.} \item{axisTitle.x}{A label for the x axis. useful when plotting histograms with metric scales where no category labels are assigned to the x axis.} \item{axisTitle.y}{A label for the y axis. useful when plotting histograms with metric scales where no category labels are assigned to the y axis.} \item{axisTitleColor}{The color of the x and y axis labels. Refers to \code{axisTitle.x} and \code{axisTitle.y}, not to the tick mark or category labels.} \item{axisTitleSize}{the size of the x and y axis labels. Refers to \code{axisTitle.x} and \code{axisTitle.y}, not to the tick mark or category labels. Default is 1.3.} \item{breakTitleAt}{Determines how many chars of the title are displayed in one line and when a line break is inserted into the title.} \item{breakLabelsAt}{Determines how many chars of the labels are displayed in one line and when a line break is inserted into the axis labels.} \item{breakLegendTitleAt}{Determines how many chars of the legend title are displayed in one line and when a line break is inserted into the legend title.} \item{breakLegendLabelsAt}{Determines how many chars of the legend labels are displayed in one line and when a line break is inserted into the axis labels.} \item{facetCluster}{If \code{TRUE}, each cluster group will be represented by an own panel. Default is \code{FALSE}, thus all cluster groups are plotted in a single graph.} \item{barColor}{User defined color for bars. \itemize{ \item If not specified (\code{NULL}), a default color palette will be used for the bar charts. \item If barColor is \code{"gs"}, a greyscale will be used. \item If barColor is \code{"bw"}, a monochrome white filling will be used. \item If barColor is \code{"brewer"}, use the \code{colorPalette} parameter to specify a palette of the \url{http://colorbrewer2.org}. } Else specify your own color values as vector (e.g. \code{barColor=c("#f00000", "#00ff00", "#0080ff")}).} \item{barAlpha}{Specify the transparancy (alpha value) of bars.} \item{colorPalette}{If \code{barColor} is \code{"brewer"}, specify a color palette from the \url{http://colorbrewer2.org} here. All color brewer palettes supported by ggplot are accepted here.} \item{barWidth}{Width of bars. Recommended values for this parameter are from 0.4 to 1.5} \item{barSpace}{Spacing between bars. Default value is 0.1. If 0 is used, the grouped bars are sticked together and have no space in between. Recommended values for this parameter are from 0 to 0.5} \item{barOutline}{If \code{TRUE}, each bar gets a colored outline. Default is \code{FALSE}.} \item{barOutlineColor}{The color of the bar outline. Only applies, if \code{barOutline} is set to \code{TRUE}.} \item{barOutlineSize}{The size of the bar outlines. Only applies if \code{barOutline} is \code{TRUE}. Default is 0.2} \item{theme}{Specifies the diagram's background theme. Default (parameter \code{NULL}) is a gray background with white grids. \itemize{ \item Use \code{"bw"} for a white background with gray grids \item \code{"classic"} for a classic theme (black border, no grids) \item \code{"minimal"} for a minimalistic theme (no border,gray grids) or \item \code{"none"} for no borders, grids and ticks. }} \item{borderColor}{User defined color of whole diagram border (panel border).} \item{axisColor}{User defined color of axis border (y- and x-axis, in case the axes should have different colors than the diagram border).} \item{hideLegend}{Indicates whether legend (guide) should be shown or not.} \item{showTickMarks}{Whether tick marks of axes should be shown or not.} \item{showAxisLabels.x}{Whether x axis labels (cluster variables) should be shown or not.} \item{showAxisLabels.y}{Whether y axis labels (z scores) should be shown or not.} \item{showGroupCount}{if \code{TRUE} (default), the count within each cluster group is added to the legend labels (e.g. \code{"Group 1 (n=87)"}).} \item{showAccuracyLabels}{if \code{TRUE}, the accuracy-values for each cluster group is added to the legend labels (e.g. \code{"Group 1 (n=87, accuracy=95.3)"}). Accuracy is calculated by \code{\link{sjc.grpdisc}}.} \item{legendTitle}{Title of the diagram's legend.} \item{legendLabels}{Labels for the guide/legend. Example: See \code{axisLabels.x}. If \code{legendLabels} is \code{NULL} (default), the standard string \code{"Group "} will be used.} \item{legendPos}{The position of the legend, if a legend is drawn. Use \code{"bottom"}, \code{"top"}, \code{"left"} or \code{"right"} to position the legend above, below, on the left or right side of the diagram. Right positioning is default.} \item{legendSize}{The text size of the legend. Default is 1. Relative size, so recommended values are from 0.3 to 2.5} \item{legendBorderColor}{Color of the legend's border. Default is \code{"white"}, so no visible border is drawn.} \item{legendBackColor}{Fill color of the legend's background. Default is \code{"white"}, so no visible background is drawn.} \item{majorGridColor}{Specifies the color of the major grid lines of the diagram background.} \item{minorGridColor}{Specifies the color of the minor grid lines of the diagram background.} \item{hideGrid.x}{If \code{TRUE}, the x-axis-gridlines are hidden. Default is \code{FALSE}.} \item{hideGrid.y}{If \code{TRUE}, the y-axis-gridlines are hidden. Default is \code{FALSE}.} \item{flipCoordinates}{If \code{TRUE}, the x and y axis are swapped.} \item{reverseAxis.x}{if \code{TRUE}, the values on the x-axis are reversed.} \item{printPlot}{If \code{TRUE} (default), plots the results as graph. Use \code{FALSE} if you don't want to plot any graphs. In either case, the ggplot-object will be returned as value.} } \value{ (Invisibly) returns an object with \itemize{ \item \code{data}: the used data frame for plotting, \item \code{plot}: the ggplot object, \item \code{groupcount}: the number of found cluster (as calculated by \code{\link{sjc.kgap}}) \item \code{classification}: the group classification (as calculated by \code{\link{sjc.cluster}}), including missing values, so this vector can be appended to the original data frame. \item \code{accuracy}: the accuracy of group classification (as calculated by \code{\link{sjc.grpdisc}}). } } \description{ Compute a quick kmeans or hierarchical cluster analysis and displays "cluster characteristics" as graph. \enumerate{ \item If \code{method} is \code{kmeans}, this function first determines the optimal group count via gap statistics (unless parameter \code{groupcount} is specified), using the \code{\link{sjc.kgap}} function. \item Than a cluster analysis is performed by running the \code{\link{sjc.cluster}} function to determine the cluster groups. \item After that, all variables in \code{data} are scaled and centered. The mean value of these z-scores within each cluster group is calculated to see how certain characteristics (variables) in a cluster group differ in relation to other cluster groups. \item These results are shown in a graph. } This method can also be used to plot existing cluster solution as graph witouth computing a new cluster analysis. See parameter \code{groups} for more details. } \note{ To get similar results as in SPSS Quick Cluster function, following points have to be considered: \enumerate{ \item Use the \code{/PRINT INITIAL} option for SPSS Quick Cluster to get a table with initial cluster centers. \item Create a \code{\link{matrix}} of this table, by consecutively copying the values, one row after another, from the SPSS output into a matrix and specifying \code{nrow} and \code{ncol} parameters. \item Use \code{algorithm="Lloyd"}. \item Use the same amount of \code{iter.max} both in SPSS and this \code{sjc.qclus}. } This ensures a fixed initial set of cluster centers (as in SPSS), while \code{\link{kmeans}} in R always selects initial cluster sets randomly. } \examples{ # K-means clustering of mtcars-dataset sjc.qclus(mtcars) # K-means clustering of mtcars-dataset with 4 pre-defined # groups in a faceted panel sjc.qclus(airquality, groupcount=4, facetCluster=TRUE) } \seealso{ \code{\link{sjc.cluster}} \cr \code{\link{sjc.kgap}} \cr \code{\link{sjc.elbow}} \cr \code{\link{sjc.grpdisc}} \cr Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K (2014) cluster: Cluster Analysis Basics and Extensions. R package. }