skip to main content
10.1109/SC.2004.36acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article

Optimal File-Bundle Caching Algorithms for Data-Grids

Published: 06 November 2004 Publication History

Abstract

The file-bundle caching problem arises frequently in scientific applications where jobs process several files concurrently. Consider a host system in a data-grid that maintains a disk cache for servicing jobs of file requests where a job is serviced only if all its requested files are present in the disk cache. Files must now be admitted into the cache and replaced in sets of file-bundles. We show that traditional caching algorithms based on file popularity measures do not perform well since they may hold in cache non-relevant combinations of files. We present and analyze a new caching algorithm for maximizing the throughput of jobs and minimizing data replacement costs at such data-grid hosts. We tested the new algorithm using a disk cache simulation model under a wide range of conditions of file request distributions, varying cache size, file size distribution, etc. The results show significant improvement over traditional caching algorithms.

References

[1]
{1} P. Cao and S. Irani. Cost-aware WWW proxy caching algorithms. In USENIX Symposium on Internet Technologies and Systems, 1997.
[2]
{2} A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke. The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets. J. Network and Computer Applications, 23(3):187- 200, 2000.
[3]
{3} ESG:. The Earth System Grid, http://www.scd.ucar.edu/css/esg/.
[4]
{4} U. Feige, D. Peleg, and G. Kortsarz. The dense k-subgraph problem. Algorithmica, 29(3):410-421, 2001.
[5]
{5} U. Hahn, W. Dilling, and D. Kaletta. Adaptive replacement algorithm for disk caches in hsm systems. In 16 Int'l. Symp on Mass Storage Syst., pages 128-140, San Diego, California, Mar. 15-18 1999.
[6]
{6} S. Khuller, A. Moss, and J. S. Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39-45, 1999.
[7]
{7} E. J. Otoo, D. Rotem, and A. Shoshani. Impact of admission and cache replacement policies on response times of jobs on data grids. In Int'l. Workshop on Challenges of Large Applications in Distrib. Environments, Seatle, Washington, Jun., 21 2003. IEEE Computer Society, Los Alamitos, California.
[8]
{8} E. J. Otoo and A. Shoshani. Accurate modeling of cache replacement policies in a data grid. In 11th NASA Goddard Conf. on Mass Storage Syst. and Tech. / 20th IEEE Symp. on Mass Storage Syst., San Diego, California, April 7 - 10 2003.
[9]
{9} T. T. Ozsu and Valduriez. Principles of distributed database systems. Prentice Hall, Upper Saddle River, N. J., 2nd edition, 1999.
[10]
{10} PPDG:. The Particle Physics Data Grid, http://www.ppdg.net/.
[11]
{11} A. Rajasekar, M. Wan, and R. Moore. Mysrb & srb - components of a data grid. In The 11th Int'l. Symp. on High Perf. Distrib. Comput. (HPDC-11), Edinburgh, Scotland, Jul. 24-26 2002.
[12]
{12} A. Shoshani, A. Sim, and J. Gu. Storage resource managers: Middleware components for grid storage. In 10th NASA Goddard Conference on Mass Storage Syst. and Tech., Apr. 15-18 2002.
[13]
{13} M. Tan, M. Theys, H. Siegel, N. Beck, and M. Jurczyk. A mathematical model, heuristic, and simulation study for a basic data staging problem in a heterogeneous networking environment. In Proc. of the 7th Hetero. Comput. Workshop, pages 115-129, Orlando, Florida, Mar. 1998.
[14]
{14} J. Wang. A survey of web caching schemes for the internet. In ACM SIGCOMM'99, Cambridge, Massachusetts, Aug. 1999.
[15]
{15} K. Wu, W. S. Koegler, J. Chen, and A. Shoshani. Using bitmap index for interactive exploration of large datasets. In SSDBM'2003, pages 65-74, Cambridge, Mass., 2003.
[16]
{16} N. Young. On-line file caching. In SODA: ACM-SIAM Symposium on Discrete Algorithms (A Conference on Theoretical and Experimental Analysis of Discrete Algorithms), 1998.

Cited By

View all
  • (2021)Optimal Online Algorithms for File-Bundle Caching and Generalization to Distributed CachingACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/34450286:1(1-23)Online publication date: 27-Mar-2021
  • (2011)CoScanProceedings of the 2nd ACM Symposium on Cloud Computing10.1145/2038916.2038927(1-12)Online publication date: 26-Oct-2011
  • (2010)Efficient querying of distributed provenance storesProceedings of the 19th ACM International Symposium on High Performance Distributed Computing10.1145/1851476.1851567(613-621)Online publication date: 21-Jun-2010
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing
November 2004
724 pages
ISBN:0769521533

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 06 November 2004

Check for updates

Qualifiers

  • Article

Conference

SC '04
Sponsor:

Acceptance Rates

SC '04 Paper Acceptance Rate 60 of 200 submissions, 30%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Optimal Online Algorithms for File-Bundle Caching and Generalization to Distributed CachingACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/34450286:1(1-23)Online publication date: 27-Mar-2021
  • (2011)CoScanProceedings of the 2nd ACM Symposium on Cloud Computing10.1145/2038916.2038927(1-12)Online publication date: 26-Oct-2011
  • (2010)Efficient querying of distributed provenance storesProceedings of the 19th ACM International Symposium on High Performance Distributed Computing10.1145/1851476.1851567(613-621)Online publication date: 21-Jun-2010
  • (2010)JAWSProceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2010.31(1-11)Online publication date: 13-Nov-2010
  • (2009)BitDewJournal of Network and Computer Applications10.1016/j.jnca.2009.04.00232:5(961-975)Online publication date: 1-Sep-2009
  • (2009)Object Caching for Queries and UpdatesProceedings of the 3rd International Workshop on Algorithms and Computation10.1007/978-3-642-00202-1_34(394-405)Online publication date: 18-Feb-2009
  • (2008)Modeling replication strategies in data grid systems with arbitrary clustered demandsProceedings of the 3rd international conference on Scalable information systems10.5555/1459693.1459705(1-9)Online publication date: 4-Jun-2008
  • (2008)BitDewProceedings of the 2008 ACM/IEEE conference on Supercomputing10.5555/1413370.1413416(1-12)Online publication date: 15-Nov-2008
  • (2008)enabling cross-layer optimizations in storage systems with custom metadataProceedings of the 17th international symposium on High performance distributed computing10.1145/1383422.1383451(213-216)Online publication date: 23-Jun-2008
  • (2008)File grouping for scientific data managementProceedings of the 17th international symposium on High performance distributed computing10.1145/1383422.1383429(153-164)Online publication date: 23-Jun-2008
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media