skip to main content
10.1145/2483760.2492397acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Generation of test databases using sampling methods

Published:15 July 2013Publication History

ABSTRACT

Populating the testing environment with relevant data represents a great challenge in software validation, generally requiring expert knowledge about the system under development, as its data critically impacts the outcome of the tests designed to assess the system. Current practices of populating the testing environments generally focus on developing efficient algorithms for generating synthetic data or use the production environment for testing purposes. The latter is an invaluable strategy to provide real test cases in order to discover issues that critically impact the user of the system. However, the production environment generally consists of large amounts of data that are difficult to handle and analyze. Database sampling from the production environment is a potential solution to overcome these challenges.

In this research, we propose two database sampling methods, VFDS and CoDS, with the objective of populating the testing environment. The first method is a very fast random sampling approach, while the latter aims at preserving the distribution of data in order to produce a representative sample. In particular, we focus on the dependencies between the data from different tables and the method tries to preserve the distributions of these dependencies.

References

  1. IBM DB2 Test Database Generator. http://www-306.ibm.com/software/data/ db2imstools/db2tools/db2tdbg/.Google ScholarGoogle Scholar
  2. S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In International Conference on Management of Data (SIGMOD), pages 275–286, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Agarwal, A. P. Iyer, A. Panda, S. Madden, B. Mozafari, and I. Stoica. Blink and it’s done: interactive queries on very large data. VLDB Endowment, 5(12):1902–1905, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Bruno and S. Chaudhuri. Flexible database generators. In Proceedings of the 31st international conference on Very large data bases (VLDB), pages 1097–1107, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. V. T. Chakaravarthy, V. Pandit, and Y. Sabharwal. Analysis of sampling techniques for association rule mining. In 12th ACM International Conference on Database Theory (ICST), pages 276–283, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Gemulla, P. Rösch, and W. Lehner. Linked bernoulli synopses: Sampling along foreign keys. In 20th International Conference on Scientific and Statistical Database Management (SSDBM), pages 6–23, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. E. Ioannidis and V. Poosala. Histogram-based approximation of set-valued query-answers. In 25th International Conference on Very Large Data Bases (VLDB), pages 174–185, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. John and P. Langley. Static versus dynamic sampling for data mining. In 2nd International Conference on Knowledge Discovery and Data Mining (KDD), pages 367–370, 1996.Google ScholarGoogle Scholar
  9. X. Lu and S. Bressan. Sampling connected induced subgraphs uniformly at random. In 24th International Conference on Scientific and Statistical Database Management (SSDBM), pages 195–212, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Olken. Random Sampling from Databases. PhD thesis, University of California at Berkeley, 1993.Google ScholarGoogle Scholar
  11. C. Olston, S. Chopra, and U. Srivastava. Generating example data for dataflow programs. In Proc. ACM International Conference on Management of data (SIGMOD), pages 245–256, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Rabl, M. Frank, H. M. Sergieh, and H. Kosch. A data generator for cloud-scale benchmarking. In Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems (TPCTC), pages 41–56, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Taneja, Y. Zhang, and T. Xie. Moda: Automated test generation for database applications via mock objects. In Proc. IEEE/ACM International Conference on Automated Software Engineering (ASE 2010), short paper, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. X. Wu, Y. Wang, S. Guo, and Y. Zheng. Privacy preserving database generation for database application testing. Fundam. Inf., 78(4):595–612, Dec. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Generation of test databases using sampling methods

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis
        July 2013
        381 pages
        ISBN:9781450321594
        DOI:10.1145/2483760

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 July 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate58of213submissions,27%

        Upcoming Conference

        ISSTA '24
      • Article Metrics

        • Downloads (Last 12 months)6
        • Downloads (Last 6 weeks)1

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader