https://github.com/cran/gclus
Raw File
Tip revision: c27795bfae1e3309def7b5fa022d60a7ca28b75a authored by Catherine Hurley on 07 January 2019, 19:00:09 UTC
version 1.3.2
Tip revision: c27795b
gclus.Rmd
---
title: "Clustering Graphics"
author: "Catherine Hurley"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Clustering Graphics}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This package will order panels in scatterplot matrices and parallel coordinate
 displays by some merit index. The package contains various indices of merit,
 ordering functions, and enhanced versions of pairs and parcoord which
 color panels according to their merit level.
 For details on the methods used, consult "Clustering Visualisations of Multidimensional 
Data", Journal of Computational and Graphical Statistics,
vol. 13, (4), pp 788-806, 2004. 



## Displaying a correlation matrix

```{r}
library(gclus)
data(longley)
longley.cor <- cor(longley)
longley.color <- dmat.color(longley.cor)
```

 `dmat.color` assigns three colours to the correlations according to the correlation
 magnitude. High correlations are in pink, the middle third are in blue, and the
 botom third are in yellow.
 
```{r fig.width=5, fig.height=5, fig.align='center'}
par(mar=c(1,1,1,1))
plotcolors(longley.color,dlabels=rownames(longley.color))
```

If you want to change the colour scheme:

```{r eval=F}
longley.color <- dmat.color(longley.cor, byrank=FALSE)
longley.color <- dmat.color(longley.cor, breaks=c(-1,0,.5,.8,1), 
                            cm.colors(4))
```


The plot is easier to interpret if variables are reorded prior to plotting.

```{r fig.width=5, fig.height=5, fig.align='center'}
par(mar=c(1,1,1,1))
longley.o <- order.hclust(longley.cor)
longley.color1 <- longley.color[longley.o,longley.o]
plotcolors(longley.color1,dlabels=rownames(longley.color1))
```


## Displaying a pairs plot with coloured panels

`cpairs` is a version of `pairs` All the high-correlation panels appear
together in a block.

```{r fig.width=5, fig.height=5, fig.align='center'}
par(mar=c(1,1,1,1))
cpairs(longley, order= longley.o,panel.color= longley.color)
```

If the `order` is not supplied, then the variables are plotted in default dataset order.

## Displaying a PCP plot with coloured panels

`cparcoord` is a versions of ` `parcoord`
where panels can be coloured. Again, the pink panels have high correlation,
blue panels have middling correlation, and yellow panels have low correlation.

```{r fig.width=8, fig.height=3, fig.align='center', out.width="100%"}
cparcoord(longley, order= longley.o,panel.color= longley.color, 
          horizontal=TRUE, mar=c(2,4,1,1))
```


## Plotting re-ordered dendrograms.


`eurodist` is a built-in distance matrix giving the distance between European cities.

```{r fig.width=6, fig.height=4, fig.align='center'}
par(mar=c(1,1,1,1))
data(eurodist)
dis <- as.dist(eurodist)
hc <- hclust(dis, "ave")
plot(hc)
```

`order.hclust` re-orders a dendrogram to improve the similarity between
nearby leaves.
Applying it to the `hc` object:

```{r fig.width=6, fig.height=4, fig.align='center'}
par(mar=c(1,1,1,1))
hc1 <- reorder.hclust(hc, dis)
plot(hc1)
```


Both dendrograms correspond to the same tree structure,
but the second one shows that
Paris is closer to Cherbourg than Munich, and
Rome is closer to Gibralter than to Barcelona.


We can also compare both orderings with an
image plot of the colors.
The second ordering seems to place nearby cities
closer to each other.


```{r fig.width=8, fig.height=3.5, fig.align='center'}

layout(matrix(1:2,nrow=1,ncol=2))
par(mar=c(1,6,1,1))
cmat <- dmat.color(eurodist, rev(cm.colors(5)))
plotcolors(cmat[hc$order,hc$order], rlabels=labels(eurodist)[hc$order])

plotcolors(cmat[hc1$order,hc1$order], rlabels=labels(eurodist)[hc1$order])

```


back to top