ACM Home Page
Please provide us with feedback. Feedback
Supporting OLAP operations over imperfectly integrated taxonomies
Full text PdfPdf (1.02 MB)
Source
International Conference on Management of Data archive
Proceedings of the 2008 ACM SIGMOD international conference on Management of data table of contents
Vancouver, Canada
SESSION: Research Session 18: Database Integration As You Go table of contents
Pages 875-888  
Year of Publication: 2008
ISBN:978-1-60558-102-6
Authors
Yan Qi  Arizona State University, Tempe, AZ, USA
K. Selçuk Candan  Arizona State University, Tempe, AZ, USA
Junichi Tatemura  NEC Laboratories America, Cupertino, CA, USA
Songting Chen  NEC Laboratories America, Cupertino, CA, USA
Fenglin Liao  University of California Santa Barbara, Santa Barbara, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 39,   Downloads (12 Months): 117,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1376616.1376703
What is a DOI?

ABSTRACT

OLAP is an important tool in decision support. With the help of domain knowledge, such as hierarchies of attribute values, OLAP helps the user observe the effects of various decisions. One assumption of most OLAP operations is that the available domain knowledge is precise. In particular, they assume that the hierarchy of values over which the user can navigate forms a taxonomy. In this paper, we first note that when multiple heterogeneous data sources are involved in the gathering of the data and the associated domain knowledge, the integrated knowledge-base, constructed by combining locally available taxonomies based on the concept matchings, may not be a taxonomy itself. Specifically, existence of intersections among concepts from different sources compromises the tree-structure of the integrated taxonomy and prevents effective use of hierarchical navigation techniques, such as drill-down and roll-up. To cope with this, we introduce concept un-classification, where a select few of the concepts are eliminated to ensure that the remaining structure is a navigable taxonomy, without concept intersections. Since un-classifying an originally classified data is not desirable, we consider ways to minimize un-classification in the process. We introduce a cost model which captures the imprecision caused by the un-classification process and we formulate the problem of finding an un-classification strategy which eliminates intersections and which adds minimal imprecision to the resulting structure. We show that, when performed naively, this task can be very costly and thus we propose a bottom-up preprocessing strategy which supports basic navigational analytics operations, such as drill-down and roll-up. Experiments over synthetic and real-life data verified the effectiveness and efficiency of our approach.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Open directory project (ODP). http://www.dmoz.org/, 2007.
 
2
Owl web ontology language. W3C Recommendation, 2004.
 
3
O. Banjelloun, A. D. Sarma, A. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. VLDB, 2006.
 
4
L. Bertossi. SIGMOD Record, pages 68--76, 35(2), 2006.
 
5
D. Burdick, P. M. Deshpande, T. S. Jayram, R. Ramakrishnan, and S. Vaithyanathan. Efficient allocation algorithms for olap over imprecise data. VLDB 2006.
 
6
D. Burdick, P. M. Deshpande, T. S. Jayram, R. Ramakrishnan, and S. Vaithyanathan. Olap over uncertain and imprecise data. The VLDB Journal, 16(1):123--144, 2007.
 
7
D. Burdick, A. Doan, R. Ramakrishnan, and S. Vaithyanathan. Olap over imprecise data with domain constraints. VLDB 2007.
 
8
K. Chakrabarti, S. Chaudhuri, and S. Won Hwang. Automatic categorization of query results. SIGMOD 2004.
 
9
Z. Chen and T. Li. Addressing diverse user preferences in sql-query-result navigation. SIGMOD 2007.
 
10
N. Choi, I.-Y. Song, and H. Han. A survey on ontology mapping. SIGMOD Rec., 35(3):34--41, 2006.
 
11
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press and McGraw-Hill, 2001.
 
12
I. Dinur and S. Safra. On the hardness of approximating minimum vertex-cover. Annals of Mathematics, 2005.
 
13
A. Doan, P. Domingos, and A. Y. Levy. Learning source description for data integration. WebDB ppr. 81--86, 2000.
 
14
A. Doan, J. Madhavan, R. Dhamankar, P. Domingos, and A. Halevy. Learning to match ontologies on the semantic web. The VLDB Journal, 12(4):303--319, 2003.
 
15
R. Fagin, R. Guha, R. Kumar, J. Novak, D. Sivakumar, and A. Tomkins. Multi-structural databases. PODS, 184--195, 2005.
 
16
A. Halevy, Z. G. Ives, D. Suciu, and I. Tatarinov. Schema mediation in peer data management. ICDE, 2003.
 
17
C. A. Hurtado, C. Gutierrez, and A. O. Mendelzon. Capturing summarizability with integrity constraints in olap. ACM Trans. Database Syst., 30(3):854--886, 2005.
 
18
C. A. Hurtado and A. O. Mendelzon. Reasoning about summarizability in heterogeneous multidimensional schemas. ICDT, pages 375--389, 2001.
 
19
M. Lenzerini. Data integration: a theoretical perspective. PODS, pages 233--246, June 2002.
 
20
J. Madhavan, P. A. Bernstein, and E. Rahm. Generic schema matching with cupid. VLDB, 2001.
 
21
L. Palopoli, D. Sacca, and D. Ursino. An automatic technique for detecting type conflicts in database schemes. CIKM, 1998.
 
22
T. B. Pedersen, C. S. Jensen, and C. E. Dyreson. Supporting imprecision in multidimensional databases using granularities. SSDBM , page 90, 1999.
 
23
R. Pottinger and P. A. Bernstein. Merging models based on given correspondences. VLDB, pages 826--873, 2003.
 
24
Y. Qi, K. S. Candan, and M. L. Sapino. Ficsr: feedback-based inconsistency resolution and query processing on misaligned data sources. SIGMOD 2007.
 
25
Y. Tzitzikas, N. Spyratos, and P. Constantopoulos. Mediators over taxonomy-based information sources. The VLDBJ., 14(1):112--136, 2005.
 
26
O. Udrea, L. Getoor, and R. J. Miller. Leveraging data and structure in ontology integration. SIGMOD, pp.449--460, 2007.
 
27
P. Wu, Y. Sismanis, and B. Reinwald. Towards keyword-driven analytical processing. In SIGMOD, pages 617--628, 2007.

Collaborative Colleagues:
Yan Qi: colleagues
K. Selçuk Candan: colleagues
Junichi Tatemura: colleagues
Songting Chen: colleagues
Fenglin Liao: colleagues