skip to main content
10.1145/2110217.2110225acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
short-paper

CernVM-FS: delivering scientific software to globally distributed computing resources

Published:14 November 2011Publication History

ABSTRACT

The computing facilities used to process data for the experiments at the Large Hadron Collider at CERN are scattered around the world. The embarrassingly parallel workload allows for use of various computing resources, such as Grid sites of the Worldwide LHC Computing Grid, commercial and institutional cloud resources, as well as individual home PCs in "volunteer clouds". Unlike data, the experiment software cannot be easily split into small work units. Efficient delivery of the complex and frequently changing experiment software is a crucial step to harness heterogeneous resources.

Here we present an approach to deliver software on demand using a scalable hierarchy of standard HTTP caches. We show how to tackle this problem by pre-processing software into content-addressable storage. On the worker nodes, we use a specially crafted file system that ensures data integrity and provides fault-tolerance. We show performance figures from large-scale deployment. For the most common case of computing clusters with 10 to 1000 worker nodes, we present a novel state dissemination protocol to support a fully decentralized and distributed memory cache.

References

  1. R. Ahlswede et al. Fault-tolerant minimum broadcast networks. Networks, 27(4):293--308, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  2. T. E. Anderson et al. Serverless network file systems. ACM Transactions on Computer Systems, 14(1):41--79, February 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. I. Bird et al. LHC computing grid: Technical design report. Technical Report LCG-TDR-001, CERN, 2005.Google ScholarGoogle Scholar
  4. K. Birman. The promise, and limitations, of gossip protocols. ACM SIGOPS Operating Systems Review, 41(5):8--13, October 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. P. Birman et al. Bimodal multicast. ACM Transactions on Computer Systems (TOCS), 17(2):41--88, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Blomer and T. Fuhrmann. A fully decentralized file system cache for the CernVM-FS. In Proc. 10th int. conf. on Computer and Communications Networks (ICCCN), August 2010.Google ScholarGoogle ScholarCross RefCross Ref
  7. P. Buncic et al. CernVM: a virtual appliance for LHC applications. Journal of Physics: Conference Series, 219, 2010.Google ScholarGoogle Scholar
  8. F. Chang et al. Bigtable: A distributed storage system for structured data. In Proc. of the 7th Conf. on USENIX Symposium on Operating Systems Design and Implementation, pages 205--218, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Compostella et al. CDF software distribution on the grid using Parrot. Journal of Physics: Conference Series, 219, 2010.Google ScholarGoogle Scholar
  10. G. DeCandia et al. Dynamo: Amazon's highly available key-value store. ACM SIGOPS Operating Systems Review, 41(6):205--220, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Dorigo, P. Elmer, F. Furano, and A. Hanushevsky. XROOTD - a highly scalable architecture for data access. WSEAS Transactions on Computers, 4(4):348--353, April 2005.Google ScholarGoogle Scholar
  12. P. Druschel and A. Rowstron. PAST: A large-scale, persistent peer-to-peer storage utility. In Proc. of the Eighth Workshop on Hot Topics in Operating Systems, pages 75--81, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Dusseault. HTTP Extensions for Web Distributed Authoring and Versioning (WebDAV). RFC 4918, Internet Engineering Task Force, June 2007.Google ScholarGoogle Scholar
  14. P. Eugster et al. Epidemic information dissemination in distributed systems. Computer, 37(5):60--67, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. T. Eugster, R. Guerraoui, S. B. Handurukande, and P. Kouznetsov. Lightweight probabilistic broadcast. ACM Transactions on Computer Systems, 21(4):341--374, November 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking, 8(3):281--293, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. Feller. An Introduction to Probability Theory and Its Applications, volume 1. Wiley, 1968.Google ScholarGoogle Scholar
  18. B. Fitzpatrick. Distributed caching with memcached. Linux Journal, 2004(124), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Fitzpatrick et al. Camlistore. http://camlistore.org.Google ScholarGoogle Scholar
  20. S. M. Hedetniemi, S. T. Hedetniemi, and A. L. Liestman. A survey of gossiping and broadcasting in communication networks. Networks, 18(4):319--349, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  21. S. Iyer, A. Rowstron, and P. Druschel. Squirrel: a decentralized peer-to-peer web cache. In PODC'02: Proceedings of the twenty-first annual symposium on Principles of distributed computing, pages 213--222, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Královič and R. Královič. Rapid almost-complete broadcasting in faulty networks. Theoretical Computer Science, 410(14):1377--1387, March 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Kutzner. The Decentralized File System Igor-FS as an Application for Overlay-Networks. PhD thesis, University of Karlsruhe, 2008.Google ScholarGoogle Scholar
  24. A. Lakshman and P. Malik. Cassandra: structured storage system on a p2p network. In Proceedings of the 28th ACM symposium on Principles of distributed computing, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Lamport. The part-time parliament. ACM Transactions on Computer Systems, 16(2):133--169, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Melhem. Low diameter interconnections for routing in high-performance parallel systems. IEEE Transactions on Computers, 56(4):502--510, Apr 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. H. Morris et al. Andrew: A distributed personal computing environment. Communications of the ACM, 29(3):184--201, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Muthitacharoen et al. Ivy: A read/write peer-to-peer file system. ACM SIGOPS Operating Systems Review, 36(SI):31--44, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. N. Nelson, B. B. Welch, and J. K. Ousterhout. Caching in the Sprite network file system. ACM Transactions on Computer Systems, 6(1):134--154, February 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. Sarkar and J. Hartman. Efficient cooperative caching using hints. ACM SIGOPS Operating Systems Review, 30(SI):35--46, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. T. Schütt, F. Schintke, and A. Reinefeld. Scalaris: reliable transactional p2p key/value store. Proceedings of the 7th ACM SIGPLAN workshop on ERLANG, pages 41--48, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Segal et al. LHC cloud computing with CernVM. PoS, ACAT(004), 2010.Google ScholarGoogle Scholar

Index Terms

  1. CernVM-FS: delivering scientific software to globally distributed computing resources

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        NDM '11: Proceedings of the first international workshop on Network-aware data management
        November 2011
        84 pages
        ISBN:9781450311328
        DOI:10.1145/2110217

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 November 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        Overall Acceptance Rate14of23submissions,61%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader