skip to main content
research-article

Value-Based File Retention: File Attributes as File Value and Information Waste Indicators

Published: 01 May 2014 Publication History

Abstract

Several file retention policy methods propose that a file retention policy should be based on file value. Though such a retention policy might increase the value of accessible files, the method to arrive at such a policy is underresearched. This article discusses how one can arrive at a method for developing file retention policies based on the use values of files. The method’s applicability is initially assessed through a case study at Capgemini, Netherlands. In the case study, we hypothesize that one can develop a file retention policy by testing causal relations between file attributes (as used by file retention methods) and the use value of files. Unfortunately, most file attributes used by file retention methods have a weak correlation with file value, resulting in the conclusion that these methods do not well select out high- and low-value files. This would imply the ineffectiveness of the used attributes in our study or errors in our conceptualization of file value. We continue with the last possibility and develop indicators for file utility (with low utility being waste). With this approach we were able to detect waste files, in a sample of files, with an accuracy of 80%. We therefore not only suggest further research in information waste detection as part of a file retention policy, but also to further explore other file attributes that could better predict file value and file utility.

References

[1]
Alpaydin, E. 2004. Introduction to Machine Learning. The MIT Press, Cambridge, MA.
[2]
Bhagwan, R., Douglis, F., Hildrum, K., Kephart, J. O., and Walsh, W. E. 2005. Time-varying management of data storage. In Proceedings of the 1st Conference on Hot Topics in System Dependability. USENIX Association, Berkley, CA, 14--14.
[3]
Blalock, Jr, H. M. 1979. Social Statistics 2nd Ed. McGraw-Hill.
[4]
Caruana, R., Karampatziakis, N., and Yessenalina, A. 2008. An empirical evaluation of supervised learning in high dimensions. In Proceedings of the 25th International Conference on Machine Learning. 96--103.
[5]
Chen, Y. 2005. Information valuation for information lifecycle management. In Proceedings of the 2nd International Conference on Autonomic Computing (ICAC’05). 135--146.
[6]
Eppler, M. J. 2006. Managing Information Quality: Increasing the Value of Information in Knowledge-Intensive Products and Processes. Springer.
[7]
Fallis, D. 2004. On verifying the accuracy of information: Philosophical perspectives. Libr. Trends 52, 463--487.
[8]
Field, A. P. 2005. Discovering Statistics Using SPSS. Sage, London.
[9]
Gibson, T. and Miller, E. 1999. An improved long-term file-usage prediction algorithm. In Proceedings of the Annual International Conference on Computer Measurement and Performance (CMG’99). 639--648.
[10]
Gregor, S. 2006. The nature of theory in information systems. MIS Quart. 30, 611--642.
[11]
Hevner, A. R., March, S. T., Park, J., and Ram, S. 2004. Design science in information systems research. MIS Quart. 28, 75--105.
[12]
Lee, J. W., Lee, J. B., Park, M., and Song, S. H. 2005. An extensive comparison of recent classification tools applied to microarray data. Comput. Statist. Data Anal. 48, 869--885.
[13]
Lee, Y. W., Strong, D. M., Kahn, B. K., and Wang, R. Y. 2002. AIMQ: A methodology for information quality assessment. Inf. Manag. 40, 133--146.
[14]
Mesnier, M., Thereska, E., Ganger, G. R., and Ellard, D. 2004. File classification in self-* storage systems. In Proceedings of the 1st International Conference on Autonomic Computing. IEEE Computer Society, 44--51.
[15]
Moody, D. and Walsh, P. 1999. Measuring the value of information: An asset valuation approach. In Proceedings of the 7th European Conference on Information Systems. 361--373.
[16]
Reiner, D., Press, G., Lenaghan, M., Barta, D., and Urmston, R. 2004. Information lifecycle management: The emc perspective. In Proceedings of the 20th International Conference on Data Engineering. 10--14.
[17]
Russell-Falla, A. P. and Hanson, A. B. 2001. Method for scanning, analyzing and rating digital information content google patents. http://www.google.com/patents/US6266664.
[18]
Sajko, M., Rabuzin, K., and Baca, M. 2006. How to calculate information value for effective security risk assessment. J. Inf. Organiz. Sci. 30, 263--278.
[19]
Shah, G., Voruganti, K., Shivam, P., and Alvarez, M. 2006. Ace: Classification for information lifecycle management. In Proceedings of the NASA Conference on Mass Storage Systems and Technologies.
[20]
Shmueli, G. 2010. To explain or to predict? Statist. Sci. 25, 289--310.
[21]
Shmueli, G. and Koppius, O. 2011. Predictive analytics in information systems research. MIS Quart. 35, 553--572.
[22]
Short, J. 2006. ILM survey: What storage, it and records managers say. ISIC Res. rep., University of Calfornia at San Diego. Information Storage Industry Center, San Diego.
[23]
Strange, S. 1992. Analysis of long-term unix file access patterns for application to automatic file migration strategies. http://www.eecs.berkeley.edu/Pubs/TechRpts/1992/CSD-92-700.pdf.
[24]
Tallon, P. P. and Scannell, R. 2007. Information lifecycle management. Comm. ACM 50, 65--70.
[25]
Tanaka, T., Ushijima, K., Ueda, R., Naitoh, I., Aizono, T., and Komoda, N. 2005. Proposal and evaluation of policy description for information lifecycle management. In Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce. 261--267.
[26]
Turczyk, L., Groepl, M., Liebau, N., and Steinmetz, R. 2007. A method for file valuation in information lifecycle management. In Proceedings of the 13th Americas Conference on Information Systems. 1122--1133.
[27]
Turczyk, L., Heckmann, O., Berbner, R., and Steinmetz, R. 2006. A formal approach to information lifecycle management. ftp://dmz02.kom.e-technik.tu-darmstadt.de/papers/THBS06-1-paper.pdf.
[28]
van Keulen, M. and de Keijzer, A. 2009. Qualitative effects of knowledge rules and user feedback in probabilistic data integration. VLDB J. 18, 1191--1217.
[29]
Vandenbosch, B. and Huff, S. L. 1997. Searching and scanning: How executives obtain information from executive information systems. MIS Quarterly, 81--107.
[30]
Verma, A., Pease, D., Sharma, U., Kaplan, M., Rubas, J., Jain, R., Devarakonda, M., and Beigi, M. 2005. An architecture for lifecycle management in very large file systems. In Proceedings of the 22nd IEEE/13th NASA Goddard Conference on Mass Storage Systems and Technologies. 160--168.
[31]
Wang, R. Y. and Strong, D. M. 1996. Beyond accuracy: What data quality means to data consumers. J. Manag. Inf. Syst. 12, 5--33.
[32]
Wijnhoven, F., Boelens, R., Middel, R., and Louissen, K. 2007. Total data quality management: A study of bridging rigor and relevance. In Proceedings of the European Conference on Information Systems.
[33]
Zadok, E., Osborn, J., Shater, A., Wright, C., Muniswamy-Reddy, K., and Nieh, J. 2004. Reducing storage management costs via informed user-based policies. In Proceedings of the IEEE Conference on Mass Storage Systems and Technologies. 101--105.
[34]
Zahedi, F. 1998. Quality information systems: A unifying framework. Int. J. Technol. Manag. 16, 446--465.

Cited By

View all
  • (2024)Pricing Data Based on Value: A Systematic Literature ReviewProceedings of the Future Technologies Conference (FTC) 2024, Volume 310.1007/978-3-031-73125-9_20(319-339)Online publication date: 8-Nov-2024
  • (2023)A Systematic Survey of Data Value: Models, Metrics, Applications and Research ChallengesIEEE Access10.1109/ACCESS.2023.331558811(104966-104983)Online publication date: 2023
  • (2022)A Fully Automated Scratch Storage Cleanup Tool for Heterogeneous Parallel FilesystemsPractice and Experience in Advanced Research Computing 2022: Revolutionary: Computing, Connections, You10.1145/3491418.3530761(1-7)Online publication date: 8-Jul-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Data and Information Quality
Journal of Data and Information Quality  Volume 4, Issue 4
May 2014
97 pages
ISSN:1936-1955
EISSN:1936-1963
DOI:10.1145/2628135
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2014
Accepted: 01 December 2013
Revised: 01 December 2013
Received: 01 August 2011
Published in JDIQ Volume 4, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Methodology
  2. case study
  3. data mining
  4. quantitative

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Pricing Data Based on Value: A Systematic Literature ReviewProceedings of the Future Technologies Conference (FTC) 2024, Volume 310.1007/978-3-031-73125-9_20(319-339)Online publication date: 8-Nov-2024
  • (2023)A Systematic Survey of Data Value: Models, Metrics, Applications and Research ChallengesIEEE Access10.1109/ACCESS.2023.331558811(104966-104983)Online publication date: 2023
  • (2022)A Fully Automated Scratch Storage Cleanup Tool for Heterogeneous Parallel FilesystemsPractice and Experience in Advanced Research Computing 2022: Revolutionary: Computing, Connections, You10.1145/3491418.3530761(1-7)Online publication date: 8-Jul-2022
  • (2021)New Trends in Intellectual Capital Disclosures of Higher Degree Institutions in IndonesiaIT and the Development of Digital Skills and Competences in Education10.4018/978-1-7998-4972-8.ch013(217-234)Online publication date: 2021
  • (2021)Exploiting user activeness for data retention in HPC systemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476201(1-14)Online publication date: 14-Nov-2021
  • (2019)DaVe: A Semantic Data Value Vocabulary to Enable Data Value CharacterisationEnterprise Information Systems10.1007/978-3-030-26169-6_12(239-261)Online publication date: 28-Jul-2019
  • (2018)Challenges in Value-Driven Data GovernanceOn the Move to Meaningful Internet Systems. OTM 2018 Conferences10.1007/978-3-030-02671-4_33(546-554)Online publication date: 22-Oct-2018

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media