skip to main content
article

Automated data verification in a format-free environment

Published: 01 March 2006 Publication History

Abstract

Data collection and interpretation are vital for innumerable purposes: both commercial and academic. Sifting through vast mountains of data to separate correct information from incorrect can be expensive both in terms of money and of time. Automation of as much of this process as possible is the key to collecting useful information in an efficient and timely manner. This paper discusses a system designed to automate the comparison of raw collected data to store of previously verified data. This comparison can be used both to estimate the accuracy and the value of the collected data. In addition, it is possible to gauge the efficacy of various collection methods. In this system special attention was paid to accepting a wide range of document formats and to properly handling data sets whose attribute types might be differently organized than those in the reference data.

References

[1]
C. Weisner (2004): Query Evaluation Techniques for Data Integration Systems, {Online Document}, Mar 2004, http://www.opus-bayern.de/uni-passau/volltexte/2004/40/pdf/QETechniquesForDISystems.pdf.
[2]
Varol, C. and C. Bayrak (2005): Application of Software En-gineering Fundamentals: A Hands of Experience, The 2005 International Conference on Software Engineering Research and Practice, June 27-30, 2005, Las Vegas, Nevada, USA.
[3]
Varol, C. and C. Bayrak (2005): Applied Software Engineering Education, ITHET 2005, July 6-9, 2005, Santa Domingo, Dominican Republic.
[4]
J. Reimer (2005): A History of the GUI, {Online Document}, May 2005, http://arstechnica.com/articles/paedia/gui.ars.
[5]
B. Stroustrup (2005): The C++ Programming Language, {Online Document}, Sep 2005, http://www.research.att.com/~bs/C++.html.
[6]
Microsoft Foundation Class Library, a Whatis.com definition, Aug 2005, http://whatis.techtarget.com/definition/0,sid9_gci214094,00.html
[7]
P. Hazel: "PCRE", {Online Document}, http://www.pcre.org/pcre.txt.

Cited By

View all
  • (2008)A domain-independent, ontology-agnostic approach to leveraging unstructured data sourcesProceedings of the 46th annual ACM Southeast Conference10.1145/1593105.1593137(122-126)Online publication date: 28-Mar-2008

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGSOFT Software Engineering Notes
ACM SIGSOFT Software Engineering Notes  Volume 31, Issue 2
March 2006
193 pages
ISSN:0163-5948
DOI:10.1145/1118537
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2006
Published in SIGSOFT Volume 31, Issue 2

Check for updates

Author Tags

  1. data mining
  2. document analysis
  3. mining methods and algorithms
  4. verification

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2008)A domain-independent, ontology-agnostic approach to leveraging unstructured data sourcesProceedings of the 46th annual ACM Southeast Conference10.1145/1593105.1593137(122-126)Online publication date: 28-Mar-2008

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media