short-paper

CernVM-FS: delivering scientific software to globally distributed computing resources

Authors:
Jakob Blomer

CERN, Geneva, Switzerland

CERN, Geneva, Switzerland
View Profile

,
Predrag Buncic

PH-SFT, Geneva, Switzerland

PH-SFT, Geneva, Switzerland
View Profile

,
Thomas Fuhrmann

Technische Universität München, München, Germany

Technische Universität München, München, Germany
View Profile

NDM '11: Proceedings of the first international workshop on Network-aware data managementNovember 2011Pages 49–56https://doi.org/10.1145/2110217.2110225

Published:14 November 2011Publication History

NDM '11: Proceedings of the first international workshop on Network-aware data management

Pages 49–56

ABSTRACT

The computing facilities used to process data for the experiments at the Large Hadron Collider at CERN are scattered around the world. The embarrassingly parallel workload allows for use of various computing resources, such as Grid sites of the Worldwide LHC Computing Grid, commercial and institutional cloud resources, as well as individual home PCs in "volunteer clouds". Unlike data, the experiment software cannot be easily split into small work units. Efficient delivery of the complex and frequently changing experiment software is a crucial step to harness heterogeneous resources.

Here we present an approach to deliver software on demand using a scalable hierarchy of standard HTTP caches. We show how to tackle this problem by pre-processing software into content-addressable storage. On the worker nodes, we use a specially crafted file system that ensures data integrity and provides fault-tolerance. We show performance figures from large-scale deployment. For the most common case of computing clusters with 10 to 1000 worker nodes, we present a novel state dissemination protocol to support a fully decentralized and distributed memory cache.

References

R. Ahlswede et al. Fault-tolerant minimum broadcast networks. Networks, 27(4):293--308, 1996.Google ScholarCross Ref
T. E. Anderson et al. Serverless network file systems. ACM Transactions on Computer Systems, 14(1):41--79, February 1996. Google ScholarDigital Library
I. Bird et al. LHC computing grid: Technical design report. Technical Report LCG-TDR-001, CERN, 2005.Google Scholar
K. Birman. The promise, and limitations, of gossip protocols. ACM SIGOPS Operating Systems Review, 41(5):8--13, October 2007. Google ScholarDigital Library
K. P. Birman et al. Bimodal multicast. ACM Transactions on Computer Systems (TOCS), 17(2):41--88, May 1999. Google ScholarDigital Library
J. Blomer and T. Fuhrmann. A fully decentralized file system cache for the CernVM-FS. In Proc. 10th int. conf. on Computer and Communications Networks (ICCCN), August 2010.Google ScholarCross Ref
P. Buncic et al. CernVM: a virtual appliance for LHC applications. Journal of Physics: Conference Series, 219, 2010.Google Scholar
F. Chang et al. Bigtable: A distributed storage system for structured data. In Proc. of the 7th Conf. on USENIX Symposium on Operating Systems Design and Implementation, pages 205--218, 2006. Google ScholarDigital Library
G. Compostella et al. CDF software distribution on the grid using Parrot. Journal of Physics: Conference Series, 219, 2010.Google Scholar
G. DeCandia et al. Dynamo: Amazon's highly available key-value store. ACM SIGOPS Operating Systems Review, 41(6):205--220, 2007. Google ScholarDigital Library
A. Dorigo, P. Elmer, F. Furano, and A. Hanushevsky. XROOTD - a highly scalable architecture for data access. WSEAS Transactions on Computers, 4(4):348--353, April 2005.Google Scholar
P. Druschel and A. Rowstron. PAST: A large-scale, persistent peer-to-peer storage utility. In Proc. of the Eighth Workshop on Hot Topics in Operating Systems, pages 75--81, 2001. Google ScholarDigital Library
L. Dusseault. HTTP Extensions for Web Distributed Authoring and Versioning (WebDAV). RFC 4918, Internet Engineering Task Force, June 2007.Google Scholar
P. Eugster et al. Epidemic information dissemination in distributed systems. Computer, 37(5):60--67, 2004. Google ScholarDigital Library
P. T. Eugster, R. Guerraoui, S. B. Handurukande, and P. Kouznetsov. Lightweight probabilistic broadcast. ACM Transactions on Computer Systems, 21(4):341--374, November 2003. Google ScholarDigital Library
L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking, 8(3):281--293, June 2000. Google ScholarDigital Library
W. Feller. An Introduction to Probability Theory and Its Applications, volume 1. Wiley, 1968.Google Scholar
B. Fitzpatrick. Distributed caching with memcached. Linux Journal, 2004(124), 2004. Google ScholarDigital Library
B. Fitzpatrick et al. Camlistore. http://camlistore.org.Google Scholar
S. M. Hedetniemi, S. T. Hedetniemi, and A. L. Liestman. A survey of gossiping and broadcasting in communication networks. Networks, 18(4):319--349, 1988.Google ScholarCross Ref
S. Iyer, A. Rowstron, and P. Druschel. Squirrel: a decentralized peer-to-peer web cache. In PODC'02: Proceedings of the twenty-first annual symposium on Principles of distributed computing, pages 213--222, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
R. Královič and R. Královič. Rapid almost-complete broadcasting in faulty networks. Theoretical Computer Science, 410(14):1377--1387, March 2009. Google ScholarDigital Library
K. Kutzner. The Decentralized File System Igor-FS as an Application for Overlay-Networks. PhD thesis, University of Karlsruhe, 2008.Google Scholar
A. Lakshman and P. Malik. Cassandra: structured storage system on a p2p network. In Proceedings of the 28th ACM symposium on Principles of distributed computing, 2009. Google ScholarDigital Library
L. Lamport. The part-time parliament. ACM Transactions on Computer Systems, 16(2):133--169, 1998. Google ScholarDigital Library
R. Melhem. Low diameter interconnections for routing in high-performance parallel systems. IEEE Transactions on Computers, 56(4):502--510, Apr 2007. Google ScholarDigital Library
J. H. Morris et al. Andrew: A distributed personal computing environment. Communications of the ACM, 29(3):184--201, 1986. Google ScholarDigital Library
A. Muthitacharoen et al. Ivy: A read/write peer-to-peer file system. ACM SIGOPS Operating Systems Review, 36(SI):31--44, 2002. Google ScholarDigital Library
M. N. Nelson, B. B. Welch, and J. K. Ousterhout. Caching in the Sprite network file system. ACM Transactions on Computer Systems, 6(1):134--154, February 1988. Google ScholarDigital Library
P. Sarkar and J. Hartman. Efficient cooperative caching using hints. ACM SIGOPS Operating Systems Review, 30(SI):35--46, 1996. Google ScholarDigital Library
T. Schütt, F. Schintke, and A. Reinefeld. Scalaris: reliable transactional p2p key/value store. Proceedings of the 7th ACM SIGPLAN workshop on ERLANG, pages 41--48, 2008. Google ScholarDigital Library
B. Segal et al. LHC cloud computing with CernVM. PoS, ACAT(004), 2010.Google Scholar

Index Terms

CernVM-FS: delivering scientific software to globally distributed computing resources
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Distributed memory
    2. Software system structures
      1. Distributed systems organizing principles

Recommendations

ASDF: An Autonomous and Scalable Distributed File System
CCGRID '11: Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

The demand for huge storage space on data-intensive applications and high-performance scientific computing continues to grow. To integrate massive distributed storage resources for providing huge storage space is an important and challenging issue in ...
Read More
A peer-to-peer meta-scheduler for service-oriented grid environments
GridNets '07: Proceedings of the first international conference on Networks for grid applications

Meta-scheduling in a Grid is aimed at enabling the efficient sharing of computing resources managed by different local schedulers within a single organization or scattered across several administrative domains. Since current Grid metaschedulers operate ...
Read More
On construction of a distributed data storage system in cloud

In the past, people have focused on cluster computing and grid computing. Now, however, this focus has shifted to cloud computing. Irrespective of what techniques are used, there are always storage requirements. The challenge people face in this area is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
NDM '11: Proceedings of the first international workshop on Network-aware data management
November 2011
84 pages
ISBN:9781450311328
DOI:10.1145/2110217
General Chairs:
Mehmet Balman
Lawrence Berkeley National Laboratory, USA
,
Surendra Byna
Lawrence Berkeley National Laboratory, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 November 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cloud computing
distributed file cache
distributed file system
gossip protocol
peer-to-peer computing
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate14of23submissions,61%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 294
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

CernVM-FS: delivering scientific software to globally distributed computing resources

NDM '11: Proceedings of the first international workshop on Network-aware data management

ABSTRACT

References

Cited By

Index Terms

Recommendations

ASDF: An Autonomous and Scalable Distributed File System

A peer-to-peer meta-scheduler for service-oriented grid environments

On construction of a distributed data storage system in cloud