skip to main content
research-article

Efficient detection of large-scale redundancy in enterprise file systems

Published:01 January 2009Publication History
Skip Abstract Section

Abstract

In order to catch and reduce waste in the exponentially increasing demand for disk storage, we have developed very efficient technology to detect approximate duplication of large directory hierarchies. Such duplication can be caused, for example, by unnecessary mirroring of repositories by uncoordinated employees or departments. Identifying these duplicate or near-duplicate hierarchies allows appropriate action to be taken at a high level. For example, one could coordinate and consolidate multiple copies in one location.

References

  1. Bolosky, W. J., Corbin, S., Goebel, D., and Douceur, J. R. 2000. Single instance storage in Windows® 2000. In Proceedings of the 4th Conference on USENIX Windows Systems Symposium -- Volume 4 (Seattle, Washington, Aug. 3-4, 2000). USENIX Association, Berkeley, CA, 2-2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Broder, A. Z., Charikar, M., Frieze, A. M., and Mitzenmacher, M. 2000. Min-wise-independent permutations. Journal of Computer and System Sciences. 60, 3 (Jun. 2000), 630--659. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Douceur, J., Adya, A., Bolosky, W., Simon, D., Theimer, M. 2002. Reclaiming Space from Duplicate Files in a Serverless Distributed File System. In the 22nd IEEE International Conference on Distributed Computing Systems (ICDCS '02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Forman, G., Eshghi, K., and Chiocchetti, S. 2005. Finding similar files in large document repositories. In the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (Chicago, Illinois, USA, August 21-24, 2005). KDD '05. ACM, New York, NY, 394--400. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Gantz, J. F. et al. 2007. The Expanding Digital Universe: A Forecast of Worldwide Information Growth Through 2010. IDC White Paper, Framingham, MA. June 22, 2007. www.idc.comGoogle ScholarGoogle Scholar
  6. Simpson, D., and Hatcher, J. TIP survey reveals storage trends. InfoStor Europe, Dec. 2006. www.infostor.comGoogle ScholarGoogle Scholar

Index Terms

  1. Efficient detection of large-scale redundancy in enterprise file systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader