skip to main content
10.1145/2637748.2638423acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesi-knowConference Proceedingsconference-collections
research-article

TimeCleanser: a visual analytics approach for data cleansing of time-oriented data

Published:16 September 2014Publication History

ABSTRACT

Poor data quality leads to unreliable results of any kind of data processing and has profound economic impact. Although there are tools to help users with the task of data cleansing, support for dealing with the specifics of time-oriented data is rather poor. However, the time dimension has very specific characteristics which introduce quality problems, that are different from other kinds of data. We present TimeCleanser, an interactive Visual Analytics system to support the task of data cleansing of time-oriented data. In order to help the user to deal with these special characteristics and quality problems, TimeCleanser combines semi-automatic quality checks, visualizations, and directly editable data tables. The evaluation of the TimeCleanser system within a focus group (two target users, one developer, and two Human Computer Interaction experts) shows that (a) our proposed method is suited to detect hidden quality problems of time-oriented data and (b) that it facilitates the complex task of data cleansing.

References

  1. J. Barateiro and H. Galhardas. A survey of data quality tools. Datenbankspektrum, 14:15--21, August 2005.Google ScholarGoogle Scholar
  2. J. Bernard, T. Ruppert, O. Goroll, T. May, and J. Kohlhammer. Visual-Interactive preprocessing of time series data. In Proc. of SIGRAD 2012: Interactive Visual Analysis of Data, pages 39--48, November 2012.Google ScholarGoogle Scholar
  3. H. Galhardas, D. Florescu, D. Shasha, and E. Simon. AJAX: An extensible data cleaning tool. SIGMOD Record, 29(2):590--596, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Gschwandtner, J. Gärtner, W. Aigner, and S. Miksch. A taxonomy of dirty time-oriented data. In G. Quirchmayr, J. Basl, I. You, L. Xu, and E. Weippl, editors, Multidisciplinary Research and Practice for Information Systems, LNCS 7465, pages 58--72. Springer, Berlin/Heidelberg, Germany, 2012.Google ScholarGoogle Scholar
  5. R. P. Jagadeesh Chandra Bose, R. S. Mans, and W. M. P. van der Aalst. Wanna improve process mining results? It's high time we consider data quality issues seriously. In Proc. of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2013), pages 127--134, April 2013.Google ScholarGoogle Scholar
  6. S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. Wrangler: Interactive visual specification of data transformation scripts. In Proc. of the ACM Conference Human Factors in Computing Systems (CHI 2011), pages 3363--3372, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Kandel, R. Parikh, A. Paepcke, J. Hellerstein, and J. Heer. Profiler: Integrated statistical analysis and visualization for data quality assessment. In Proc. of the International Working Conference on Advanced Visual Interfaces (AVI'12), pages 547--554, May 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. A. Keim, F. Mansmann, J. Schneidewind, J. Thomas, and H. Ziegler. Visual analytics: Scope and challenges. In S. J. Simoff, M. H. Böhlen, and A. Mazeika, editors, Visual Data Mining: Theory, Techniques and Tools for Visual Analytics, LNCS 4404, pages 76--90. Springer, Berlin/Heidelberg, Germany, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Kim, B.-J. Choi, E.-K. Hong, S.-K. Kim, and D. Lee. A taxonomy of dirty data. Data Mining and Knowledge Discovery, 7(1):81--99, January 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Microsoft. Excel. office.microsoft.com/en-us/excel/ (accessed: 2014-04-17).Google ScholarGoogle Scholar
  11. H. Müller and J.-C. Freytag. HUB-IB-164. Problems, methods, and challenges in comprehensive data cleansing. Technical report, Humboldt University Berlin, 2003.Google ScholarGoogle Scholar
  12. T. Munzner. A nested model for visualization design and validation. IEEE Transactions on Visualization and Computer Graphics, 15(6):921--928, November 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Oliveira, F. Rodrigues, and P. Henriques. A formal definition of data quality problems. In Proc. of the International Conference on Information Quality (MIT IQ Conference), November 2005.Google ScholarGoogle Scholar
  14. Original German quotes of the focus group session. Attached to the submission as supplemental material. ieg.ifs.tuwien.ac.at/~gschwandtner/material/quotes.pdf (accessed: 2014-04-17).Google ScholarGoogle Scholar
  15. E. Rahm and H.-H. Do. Data cleaning: Problems and current approaches. IEEE Bulletin of the Technical Committee on Data Engineering, 23(4):3--13, March 2000.Google ScholarGoogle Scholar
  16. V. Raman and J. M. Hellerstein. Potter's wheel: An interactive data cleaning system. In Proc. of the 27th International Conference on Very Large Data Bases, pages 381--390, September 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Random Developers. OpenRefine. http://openrefine.org/ (accessed: 2014-04-17).Google ScholarGoogle Scholar
  18. J. Scholtz, M. A. Whiting, C. Plaisant, and G. Grinstein. A reflection on seven years of the VAST challenge. In Proc. of the 2012 BELIV Workshop: Beyond Time and Errors - Novel Evaluation Methods for Visualization, pages 13:1--13:8, October 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Sedlmair, M. Meyer, and T. Munzner. Design study methodology: Reections from the trenches and the stacks. IEEE Trans. Visualization and Computer Graphics, 18(12):2431--2440, October 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Shneiderman. The eyes have it: A task by data type taxonomy for information visualizations. In Proc. of the 1996 IEEE Symposium on Visual Languages, pages 336--343, September 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Talend. Profiler. http://www.talend.com/ (accessed: 2014-04-17).Google ScholarGoogle Scholar
  22. XIMES GmbH. Time Intelligence Solutions {TIS}. www.ximes.com/en/software/products/tis/ (accessed: 2014-04-17).Google ScholarGoogle Scholar
  23. Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma. Mining interesting locations and travel sequences from GPS trajectories. In Proc. of the International Conference on World Wild Web (WWW 2009), pages 791--800, April 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. TimeCleanser: a visual analytics approach for data cleansing of time-oriented data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        i-KNOW '14: Proceedings of the 14th International Conference on Knowledge Technologies and Data-driven Business
        September 2014
        262 pages
        ISBN:9781450327695
        DOI:10.1145/2637748

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 September 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        i-KNOW '14 Paper Acceptance Rate25of73submissions,34%Overall Acceptance Rate77of238submissions,32%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader