Hybrid hierarchical clustering software for R
Authors: Hugh Chipman and Rob Tibshirani
Maintainer: Hugh Chipman
This software, written in the R language, identifies "mutual clusters" and
uses them in a hybrid hierarchical clustering. A mutual cluster is a set
of points whose largest within-group distance is smaller than the distance
to the nearest point outside the set. This idea is used in a hybrid
clustering algorithm that performs top-down clustering subject to the
constraint that a mutual cluster can only be subdivided after it is
isolated from all other points. Such a clustering algorithm can be useful
in situations in which interest focuses on both coarse and fine partitions
of the data. For example, in clustering microarray samples, one might be
interested both in (1) identifying small groups of samples that are very
similar to each other and (2) partitioning all samples into five groups
that can be broadly interpreted.
This sofware is based on the paper
- Chipman, H. and Tibshirani, R. (2006) "Hybrid Hierarchical Clustering
with Applications to Microarray Data", Biostatistics, 7, 302-317.
link to journal article: (pdf)
The package can be installed directly from
- hybridHclust package (see CRAN link above)
- Breast tumour data used in paper:
sorlie.txt (456 gene expressions for 85 tissue
samples, with missing values imputed via a 10 knn method.) |
(original cluster labels from Sorlie et. al. (2001))
- Related work: "hopach" package in R, developed by Pollard and van der Laan
(2003), and described in
- M.J. van der Laan, K.S. Pollard (2003). Hybrid clustering of gene
expression data with visualization and the bootstrap. Journal of Statistical
Planning and Inference, 117, pp. 275-303.
History of Changes:
- March 20, 2006: Initial release (1.0-0) available on CRAN
- July 5, 2006: Minor revision (1.0-1). Clarified hybridHclust
documentation to illustrate how to get a "correlation" distance, removed
machine-readable email address, added references to HT paper and see also to
"hopach" package to "hybridHclust" help, and modified examples of
hybridHclust and mutualCluster helps.
- March 8, 2008: Minor revision (1.0-3). Cosmetic fixes (correct
email for Maintainer in DESCRIPTION; replace help reference to non-existing
"sorlie.labels" with "sorlielabels"; clean up internal definitions of "dfun"
and "d2fun" inside "eisenCluster" function).