ACM Home Page
Please provide us with feedback. Feedback
Fast hierarchical clustering and other applications of dynamic closest pairs
Full text LatexLatex (4.43 MB),  HtmlHtml (5 KB),  PdfPdf (302 KB),  PsPs (2.23 MB)
Source Journal of Experimental Algorithmics (JEA) archive
Volume 5 ,  (2000) table of contents
Article No. 1  
Year of Publication: 2000
ISSN:1084-6654
Author
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 19,   Citation Count: 4
Additional Information:

appendices and supplements   abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/351827.351829
What is a DOI?

APPENDICES and SUPPLEMENTS
TarTar (133 KB),
The software suite accompanying the article; this is a small Unix tar file, which includes the source code, a Makefile, and the test files used in the article.


ABSTRACT

We develop data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects. Based on a technique previously used by the author for Euclidean closest pairs, we show how to insert and delete objects from an <i>n</i>-object set, maintaining the closest pair, in <i>O</i>(<i>n</i> log<sup>2</sup> <i>n</i>) time per update and <i>O</i>(<i>n</i>) space. With quadratic space, we can instead use a quadtree-like structure to achieve an optimal time bound, <i>O</i>(<i>n</i>) per update. We apply these data structures to hierarchical clustering, greedy matching, and TSP heuristics, and discuss other potential applications in machine learning, Gröbner bases, and local improvement algorithms for partition and placement problems. Experiments show our new methods to be faster in practice than previously used heuristics.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
ADAMS, W. W. AND LOUSTAUNAU, P. 1994. <i>An Introduction to Gröbner Bases</i>. Number 3 in Graduate Studes in Mathematics. AMS.
 
2
 
3
AICHHOLZER, O. 1997. <i>Combinatorial & Computational Properties of the Hypercube - New Results on Covering, Slicing, Clustering and Searching on the Hypercube</i>. Ph. D. thesis, Tech. Univ. Graz.
 
4
ANDERBERG, M. R. 1973. <i>Cluster Analysis for Applications</i>. Number 19 in Probability and Mathematical Statistics. Academic Press, New York.
 
5
BALL, G. H. AND HALL, D.J. 1965. ISODATA, a novel method of data analysis and pattern classification. Technical report, Stanford Research Inst.
 
6
 
7
 
8
CHENG, X. AND WALLACE, J. M. 1993. Cluster analysis of the northern hemisphere wintertime 500-hPa height field: spatial patterns. <i>J. Atmospheric Sciences 50</i>, 16 (August), 2674-2696.
9
 
10
CORPET, F. 1988. Multiple sequence alignment with hierarchical clustering. <i>Nucleic Acids Research 16</i>, 22, 10881-10890.
11
12
 
13
DURAN, B. S. AND ODELL, P.L. 1974. <i>Cluster Analysis: A Survey</i>. Number 100 in Lecture Notes in Economics and Mathematical Systems. Springer-Verlag, Berlin.
 
14
EPPSTEIN, D. 1995. Dynamic Euclidean minimum spanning trees and extrema of binary functions. <i>Discrete & Computational Geometry 13</i>, 1 (January), 111-122.
 
15
EPPSTEIN, D. AND ERICKSON, J. 1999. Raising roofs, crashing cycles, and playing pool: applications of a data structure for finding pairwise interactions. <i>Discrete & Computational Geometry 22</i>, 4, 569-592.
 
16
FELSENSTEIN, J. 1995. PHYLIP (Phylogeny Inference Package) version 3.572c. Distributed by the author.
 
17
18
 
19
 
20
GOTOH, O. 1994. Further improvements in methods of group-to-group sequence alignment with generalized profile operations. <i>CABIOS 10</i>, 4, 379-387.
 
21
 
22
KRZNARIC, D. AND LEVCOPOULOS, C. 1998. Fast algorithms for complete linkage clustering. <i>Discrete & Computational Geometry 19</i>, 1 (January), 131-145.
 
23
MATIAS, Y. 1993. Semi-dynamic closest-pair algorithms. In <i>Proc. 5th Canad. Conf. Computational Geometry</i> (August 1993), pp. 264-271.
 
24
MICHENER, C. D. AND SOKAL, R.R. 1957. A quantitative approach to a problem in classification. <i>Evolution 11</i>, 130-162.
 
25
PAZZANI, M.J. 1997. Constructive induction of Cartesian product attributes. Manuscript.
 
26
REINGOLD, E. M. AND TARJAN, R.E. 1981. On a greedy heuristic for complete matching. <i>SIAM J. Computing 10</i>, 4 (November), 676-681.
 
27
ROSENKRANTZ, D. H., STEARNS, R. E., AND LEWIS, P. M., II. 1977. An analysis of several heuristics for the traveling salesman problem. <i>SIAM J. Computing 6</i>, 3 (September), 563- 581.
 
28
SCHWARZ, C., SMID, M., AND SNOEYINK, J. 1994. An optimal algorithm for the on-line closest pair problem. <i>Algorithmica 12</i>, 1 (July), 18-29.
 
29
 
30
STURMFELS, B. 1996. Gröbner Bases and Convex Polytopes. Number 8 in University Lecture Ser. AMS.
 
31
 
32
THOMPSON, J. D., HIGGINS, D. G., AND GIBSON, T. J. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. <i>Nucleic Acids Research 22</i>, 4673- 4680.
 
33
WARD, J. 1963. Hierarchical grouping to optimize an objective function. <i>J. Amer. Statistical Assoc. 58</i>, 301, 236-244.
 
34
 
35
ZUPAN, J. 1982. <i>Clustering of Large Data Sets</i>. Chemometrics Research Studies Ser. Research Studies Press, Chichester.



Peer to Peer - Readers of this Article have also read: