research-article

Generation of test databases using sampling methods

Author:
Teodora Sandra Buda

University College Dublin, Ireland

University College Dublin, Ireland
View Profile

ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and AnalysisJuly 2013Pages 366–369https://doi.org/10.1145/2483760.2492397

Published:15 July 2013Publication History

ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis

Pages 366–369

ABSTRACT

Populating the testing environment with relevant data represents a great challenge in software validation, generally requiring expert knowledge about the system under development, as its data critically impacts the outcome of the tests designed to assess the system. Current practices of populating the testing environments generally focus on developing efficient algorithms for generating synthetic data or use the production environment for testing purposes. The latter is an invaluable strategy to provide real test cases in order to discover issues that critically impact the user of the system. However, the production environment generally consists of large amounts of data that are difficult to handle and analyze. Database sampling from the production environment is a potential solution to overcome these challenges.

In this research, we propose two database sampling methods, VFDS and CoDS, with the objective of populating the testing environment. The first method is a very fast random sampling approach, while the latter aims at preserving the distribution of data in order to produce a representative sample. In particular, we focus on the dependencies between the data from different tables and the method tries to preserve the distributions of these dependencies.

References

IBM DB2 Test Database Generator. http://www-306.ibm.com/software/data/ db2imstools/db2tools/db2tdbg/.Google Scholar
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In International Conference on Management of Data (SIGMOD), pages 275–286, 1999. Google ScholarDigital Library
S. Agarwal, A. P. Iyer, A. Panda, S. Madden, B. Mozafari, and I. Stoica. Blink and it’s done: interactive queries on very large data. VLDB Endowment, 5(12):1902–1905, 2012. Google ScholarDigital Library
N. Bruno and S. Chaudhuri. Flexible database generators. In Proceedings of the 31st international conference on Very large data bases (VLDB), pages 1097–1107, 2005. Google ScholarDigital Library
V. T. Chakaravarthy, V. Pandit, and Y. Sabharwal. Analysis of sampling techniques for association rule mining. In 12th ACM International Conference on Database Theory (ICST), pages 276–283, 2009. Google ScholarDigital Library
R. Gemulla, P. Rösch, and W. Lehner. Linked bernoulli synopses: Sampling along foreign keys. In 20th International Conference on Scientific and Statistical Database Management (SSDBM), pages 6–23, 2008. Google ScholarDigital Library
Y. E. Ioannidis and V. Poosala. Histogram-based approximation of set-valued query-answers. In 25th International Conference on Very Large Data Bases (VLDB), pages 174–185, 1999. Google ScholarDigital Library
G. John and P. Langley. Static versus dynamic sampling for data mining. In 2nd International Conference on Knowledge Discovery and Data Mining (KDD), pages 367–370, 1996.Google Scholar
X. Lu and S. Bressan. Sampling connected induced subgraphs uniformly at random. In 24th International Conference on Scientific and Statistical Database Management (SSDBM), pages 195–212, 2012. Google ScholarDigital Library
F. Olken. Random Sampling from Databases. PhD thesis, University of California at Berkeley, 1993.Google Scholar
C. Olston, S. Chopra, and U. Srivastava. Generating example data for dataflow programs. In Proc. ACM International Conference on Management of data (SIGMOD), pages 245–256, 2009. Google ScholarDigital Library
T. Rabl, M. Frank, H. M. Sergieh, and H. Kosch. A data generator for cloud-scale benchmarking. In Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems (TPCTC), pages 41–56, 2011. Google ScholarDigital Library
K. Taneja, Y. Zhang, and T. Xie. Moda: Automated test generation for database applications via mock objects. In Proc. IEEE/ACM International Conference on Automated Software Engineering (ASE 2010), short paper, 2010. Google ScholarDigital Library
X. Wu, Y. Wang, S. Guo, and Y. Zheng. Privacy preserving database generation for database application testing. Fundam. Inf., 78(4):595–612, Dec. 2007. Google ScholarDigital Library

Index Terms

Generation of test databases using sampling methods
1. Information systems
  1. Data management systems
    1. Database management system engines
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Towards realistic sampling: generating dependencies in a relational database
ICUIMC '13: Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication

Managing large amounts of information is one of the most expensive, time-consuming and non-trivial activities and it usually requires expert knowledge. In a wide range of application areas, such as data mining, histogram construction, approximate query ...
Read More
VFDS: An Application to Generate Fast Sample Databases
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Large amounts of data often require expensive and time-consuming analysis. Therefore, highly scalable and efficient techniques are necessary to process, analyze and discover useful information. Database sampling has proven to be a powerful method to ...
Read More
Test-Driven Development of Relational Databases

Developers can use a test-driven development with database schema just as they use it with application code. Implementing test-driven database development (TDDD) involves three relatively simple steps: database refactoring, database regression testing, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis
July 2013
381 pages
ISBN:9781450321594
DOI:10.1145/2483760
General Chair:
Mauro Pezzè,
Program Chair:
Mark Harman
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 July 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Database sampling
relational database
testing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate58of213submissions,27%
Upcoming Conference
ISSTA '24

Sponsor:

sigsoft

33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 16 - 20, 2024

Vienna , Austria
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 333
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Generation of test databases using sampling methods

ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Towards realistic sampling: generating dependencies in a relational database

VFDS: An Application to Generate Fast Sample Databases

Test-Driven Development of Relational Databases