ACM Home Page
Please provide us with feedback. Feedback
BIwTL: a business information warehouse toolkit and language for warehousing simplification and automation
Full text PdfPdf (17.76 MB)
Source
International Conference on Management of Data archive
Proceedings of the 2007 ACM SIGMOD international conference on Management of data table of contents
Beijing, China
SESSION: Data processing in the large table of contents
Pages: 1041 - 1052  
Year of Publication: 2007
ISBN:978-1-59593-686-8
Authors
Bin He  IBM Almaden Research Center, San Jose, CA
Rui Wang  IBM Almaden Research Center, San Jose, CA
Ying Chen  IBM Almaden Research Center, San Jose, CA
Ana Lelescu  IBM Almaden Research Center, San Jose, CA
James Rhodes  IBM Almaden Research Center, San Jose, CA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 31,   Downloads (12 Months): 284,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1247480.1247603
What is a DOI?

ABSTRACT

Rapidly leveraging information analytics technologies to mine the mounting information in structured and unstructured forms, derive business insights and improve decision making is becoming increasingly critical to today's business successes. One of the key enablers of the analytics technologies is an Information Warehouse Management System (IWMS) that processes different types and forms of information, builds, and maintains the information warehouse (IW) effectively. Although traditional multi-dimensional data warehousing techniques, coupled with the well-known ETL processes (Extract, Transform, Load) may meet some of the requirements in an IWMS, in general, they fall short on several major aspects: 1. They often lack comprehensive support for both structured and unstructured data processing; 2. they are database-centric and require detailed database and data warehouse knowledge to perform IWMS tasks, and hence they are tedious and time-consuming to operate and learn; 3. they are often inflexible and insufficient in coping with a wide variety of on-going IW maintenance tasks, such as adding new dimensions and handling regular and lengthy data updates with potential failures and errors.

To cope with such issues, this paper describes an IWMS, called BIwTL (Business Information Warehouse Toolkit and Language), that automates and simplifies IWMS tasks by devising a high-level declarative information warehousing language, GIWL, and building the runtime system components for such a language. BIwTL hides system details, e.g., databases, full text indexers, and data warehouse models, from users by automatically generating appropriate runtime scripts and executing them based on the GIWL language specification. Moreover, BIwTL supports structured and unstructured information processing by embedding flexible data extraction and transformation capabilities, while ensuring high performance processing for large datasets. In addition, this paper systematically studied the core tasks around information warehousing and identified five key areas. In particular, we describe our technologies in three areas, i.e., constructing an IW, data loading, and maintaining an IW. We have implemented such technologies in BIwTL 1.0 and validated it in real world environments with a number of customers. Our experience suggests that BIwTL is light-weight, simple, efficient, and flexible.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Juru. http://www.haifa.ibm.com/km/ir/juru/index.html.
 
2
Lucene. http://lucene.apache.org.
 
3
SQL. http://en.wikipedia.org/wiki/SQL.
 
4
XPath. http://www.w3.org/TR/xpath.
 
5
 
6
 
7
8
 
9
 
10
 
11
 
12
 
13
 
14
 
15
 
16
J. Han, Y. Fu, W. Wang, K. Koperski, and O. Zaiane. DMQL: A data mining query language for relational databases. In SIGMOD'96 Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD'96), 1996.
 
17
J. Han and M. Kamber. Data Mining: Concept and Techniques. Morgan Kaufmann, 2000.
 
18
V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. pages 205--216, 1996.
 
19
C. A. Hurtado, A. O. Mendelzon, and A. A. Vaisman. Maintaining data cubes under dimension updates. In ICDE Conference, pages 346--355, 1999.
 
20
 
21
IBM. Ascential. http://ibm.ascential.com.
 
22
IBM. Business Insights Workbench. http://www.almaden.ibm.com/asr/projects/biw.
 
23
IBM. DB2 Data Warehouse Edition. http://www-306.ibm.com/software/data/db2/dwe.
 
24
IDC. Worldwide data warehousing tools 2005-2009 forecast. 2005.
 
25
Kalido. Enterprise Data Warehousing. http://www.kalido.com.
 
26
R. Kimball and M. Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. Wiley Publishing, Inc, 2002.
 
27
 
28
L. Libkin, R. Machlin, and L. Wong. A query language for multidimensional arrays: design, implementation, and optimization techniques. pages 228--239, 1996.
 
29
Microsoft. BI Accelerator. http://www.microsoft.com/sql/-prodinfo/previousversions/ssabi/default.mspx.
 
30
 
31
 
32
 
33
 
34
Sunopsis. Data Conductor. http://www.sunopsis.com.
 
35
 
36
 
37
D. Xin, J. Han, X. Li, and B. W. Wah. Star-cubing: Computing iceberg cubes by top-down and bottom-up integration. In VLDB Coference, pages 476--487, 2003.
 
38
L. L. Yan, R. J. Miller, L. M. Haas, and R. Fagin. Data-driven understanding and refinement of schema mappings. In SIGMOD Conference, 2001.
 
39
40
 
41

Collaborative Colleagues:
Bin He: colleagues
Rui Wang: colleagues
Ying Chen: colleagues
Ana Lelescu: colleagues
James Rhodes: colleagues