Article

Fragments of order

Authors:
Aristides Gionis

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Teija Kujala

University of Helsinki, P.O. Box 26, Teollisuuskatu 23, Helsinki, Finland

University of Helsinki, P.O. Box 26, Teollisuuskatu 23, Helsinki, Finland
View Profile

,
Heikki Mannila

University of Helsinki, P.O. Box 26, Teollisuuskatu 23, Helsinki, Finland

University of Helsinki, P.O. Box 26, Teollisuuskatu 23, Helsinki, Finland
View Profile

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2003Pages 129–136https://doi.org/10.1145/956750.956768

Published:24 August 2003Publication History

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 129–136

ABSTRACT

High-dimensional collections of 0--1 data occur in many applications. The attributes in such data sets are typically considered to be unordered. However, in many cases there is a natural total or partial order ≺ underlying the variables of the data set. Examples of variables for which such orders exist include terms in documents, courses in enrollment data, and paleontological sites in fossil data collections. The observations in such applications are flat, unordered sets; however, the data sets respect the underlying ordering of the variables. By this we mean that if A ≺ B ≺ C are three variables respecting the underlying ordering ≺, and both of variables A and C appear in an observation, then, up to noise levels, variable B also appears in this observation. Similarly, if A₁ ≺ A₂ ≺ … ≺ A_l-1 ≺ A_i is a longer sequence of variables, we do not expect to see many observations for which there are indices i < j < k such that A_i and A_k occur in the observation but A_j does not.In this paper we study the problem of discovering fragments of orders of variables implicit in collections of unordered observations. We define measures that capture how well a given order agrees with the observed data. We describe a simple and efficient algorithm for finding all the fragments that satisfy certain conditions. We also discuss the sometimes necessary postprocessing for selecting only the best fragments of order. Also, we relate our method with a sequencing approach that uses a spectral algorithm, and with the consecutive ones problem. We present experimental results on some real data sets (author lists of database papers, exam results data, and paleontological data).

References

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In P. Buneman and S. Jajodia, editors, Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'93), pages 207--216, Washington, D.C., USA, May 1993. ACM. Google ScholarDigital Library
R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering (ICDE'95), pages 3--14, Taipei, Taiwan, Mar. 1995. Google ScholarDigital Library
J. E. Atkins, E. G. Boman, and B. Hendrickson. A spectral algorithm for seriation and the consecutive ones problem. SIAM Journal on Computing, 28(1):297--310, Feb. 1999. Google ScholarDigital Library
K. S. Booth and G. S. Lueker. Linear algorithms to recognize interval graphs and test for the consecutive ones property. In ACM, editor, Conference record of Seventh Annual ACM Symposium on Theory of Computing: papers presented at the Symposium, Albuquerque, New Mexico, May 5-May 7, 1975, pages 255--265, New York, NY, USA, 1975. ACM Press. Google ScholarDigital Library
K. S. Booth and G. S. Lueker. Testing for the consecutive ones property, interval graphs, and graph planarity using P-Q tree algorithms. J. of Comp. and Syst. Sci., 13:335--379, 1976.Google ScholarDigital Library
T. F. Chan and D. C. Resasco. A framework for the analysis and construction of domain decomposition preconditioners. Technical Report CAM-87-09, UCLA, 1987.Google Scholar
F. R. K. Chung. Spectral Graph Theory. CBMS Regional Conference Series in Mathematics, 1997.Google Scholar
R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge, 1998.Google ScholarCross Ref
M. Fortelius. Private communication. 2003.Google Scholar
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000. Google ScholarDigital Library
Hsu. A simple test for the consecutive ones property. Journal of Algorithms, 43, 2002. Google ScholarDigital Library
J. Jernvall and M. Fortelius. Common mammals drive the evolutionary increase of hypsodonty in the neogene. Nature, 417:538--540, 2002.Google ScholarCross Ref
Y. Koren and D. Harel. Multi-scale algorithm for the linear arrangement problem. Technical Report MCS02-04, Faculty of Mathematics and Computer Science, The Weizmann Institute of Science, 2002.Google ScholarCross Ref
H. Mannila and C. Meek. Global partial orders from sequential data. In Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Boston, MA, pages 161--168. ACM Press, 2000. Google ScholarDigital Library
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259--289, Nov. 1997. Google ScholarDigital Library
A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In In Advances in Neural Information Processing Systems, 2001.Google Scholar
A. Popescul, G. W. Flake, S. Lawrence, L. H. Ungar, and C. L. Giles. Clustering and identifying temporal trends in document databases. In ADL 2000, pages 173--182, 2000. Google ScholarDigital Library
A. Pothen, H. Simon, and L. Wang. Spectral nested dissection. Technical Report CS-92-01, Pennsylvania State University, Department of Computer Science, 1992.Google Scholar
R. Ramakrishnan and J. Gehrke. Database Management Systems (2nd ed.). McGraw-Hill, 2001. Google ScholarDigital Library
H. D. Simon. Partitioning of unstructured mesh problems for parallel processing. Computing Systems in Engineering, 2, 1991.Google Scholar

Index Terms

Fragments of order
1. Information systems
  1. Information systems applications
    1. Data mining
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Finding partial orders from unordered 0-1 data
KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

In applications such as paleontology and medical genetics the 0-1 data has an underlying unknown order (the ages of the fossil sites, the locations of markers in the genome). The order might be total or partial: for example, two sites in different parts ...
Read More
Time and sample efficient discovery of Markov blankets and direct causal relations
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Data Mining with Bayesian Network learning has two important characteristics: under conditions learned edges between variables correspond to casual influences, and second, for every variable T in the network a special subset (Markov Blanket) ...
Read More
Set covering with almost consecutive ones property

In this paper, we consider set covering problems with a coefficient matrix almost having the consecutive ones property, i.e., in most rows of the coefficient matrix, the ones appear consecutively and only a few blocks of consecutive ones appear in the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2003
736 pages
ISBN:1581137370
DOI:10.1145/956750
Conference Chair:
Lise Getoor
University of Maryland, College Park
,
General Chair:
Ted Senator
DARPA
,
Program Chairs:
Pedro Domingos
University of Washington
,
Christos Faloutsos
Carnegie Mellon University
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
consecutive ones property
discovering hidden orderings
novel data mining algorithms
spectral analysis of data
Qualifiers
- Article
Conference

Acceptance Rates
KDD '03 Paper Acceptance Rate46of298submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 524
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fragments of order

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Finding partial orders from unordered 0-1 data

Time and sample efficient discovery of Markov blankets and direct causal relations

Set covering with almost consecutive ones property