skip to main content
research-article

A Prefetching Scheme Exploiting both Data Layout and Access History on Disk

Published: 01 August 2013 Publication History

Abstract

Prefetching is an important technique for improving effective hard disk performance. A prefetcher seeks to accurately predict which data will be requested and load it ahead of the arrival of the corresponding requests. Current disk prefetch policies in major operating systems track access patterns at the level of file abstraction. While this is useful for exploiting application-level access patterns, for two reasons file-level prefetching cannot realize the full performance improvements achievable by prefetching. First, certain prefetch opportunities can only be detected by knowing the data layout on disk, such as the contiguous layout of file metadata or data from multiple files. Second, nonsequential access of disk data (requiring disk head movement) is much slower than sequential access, and the performance penalty for mis-prefetching a randomly located block, relative to that of a sequential block, is correspondingly greater.
To overcome the inherent limitations of prefetching at logical file level, we propose to perform prefetching directly at the level of disk layout, and in a portable way. Our technique, called DiskSeen, is intended to be supplementary to, and to work synergistically with, any present file-level prefetch policies. DiskSeen tracks the locations and access times of disk blocks and, based on analysis of their temporal and spatial relationships, seeks to improve the sequentiality of disk accesses and overall prefetching performance. It also implements a mechanism to minimize mis-prefetching, on a per-application basis, to mitigate the corresponding performance penalty.
Our implementation of the DiskSeen scheme in the Linux 2.6 kernel shows that it can significantly improve the effectiveness of prefetching, reducing execution times by 20%--60% for microbenchmarks and real applications such as grep, CVS, and TPC-H. Even for workloads specifically designed to expose its weaknesses, DiskSeen incurs only minor performance loss.

References

[1]
Baek, S. H. and Park, K. H. 2008. Prefetching with adaptive cache culling for striped disk arrays. In Proceedings of the USENIX Annual Technical Conference (ATC’08).
[2]
Butt, A. R., Gniady, C., and Hu, Y. C. 2005. The performance impact of kernel prefetching on buffer cache replacement algorithms. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’05). 157--168.
[3]
Cao, P., Felten, E. W., and Li, K. 1994. Application-controlled file caching policies. In Proceedings of the USENIX Summer Technical Conference (USTC’94).
[4]
Cao, P., Felten, E. W., Karlin, A. R., and Li, K. 1996. Implementation and performance of integrated application controlled file caching, prefetching, and disk scheduling. ACM Trans. Comput. Syst. 14, 4, 311--343.
[5]
Chang, F. and Gibson, G. A. 1999. Automatic i/o hint generation through speculative execution. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI’99).
[6]
Chen, X. and Zhang, X. 2003. A popularity-based prediction model for web prefetching. IEEE Comput. 36, 3, 63--70.
[7]
Diaz, P. and Cintra, M. 2009. Stream chaining: Exploiting multiple levels of correlation in data prefetching. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09).
[8]
Ding, X., Jiang, S., Chen, F., Davis, K., and Zhang, X. 2007. DiskSeen: Exploiting disk layout and access history to enhance i/o prefetch. In Proceedings of the USENIX Annual Technical Conference (USENIX’07).
[9]
Douceur, J. R. and Bolosky, W. J. 1999. A large-scale study of file-system contents. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’99). 59--70.
[10]
Faser, K. and Chang, F. 2003. Operating system i/o speculation: How two invocations are faster than one. In Proceedings of the USENIX Annual Technical Conference (USENIX’03). 325--338.
[11]
Ganger, G. R. and Kaashoek, M. F. 1997. Embedded inodes and explicit grouping: Exploiting disk bandwidth for small files. In Proceedings of USENIX Annual Technical Conference (USENIX’97).
[12]
Gill, B. S. and Bathen, L. A. D. 2007. AMP: Adaptive multi-stream prefetching in a shared cache. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST’07).
[13]
Griffioen, J. and Appleton, R. 1994. Reducing file system latency using a predictive approach. In Proceedings of the USENIX Summer Technical Conference (USTC’94).
[14]
Huang, H., Hung, W., and Shin, K. G. 2005. FS2: Dynamic data replication in free disk space for improving disk performance and energy consumption. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP’05). 263--276.
[15]
Jiang, S., Ding, X., Chen, F., Tan, E., and Zhang, X. 2005. DULO: An effective buffer cache management scheme to exploit both temporal and spatial locality. In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST’05).
[16]
Kroeger, T. M. and Long, D. D. E. 2001. Design and implementation of a predictive file prefetching algorithm. In Proceedings of the USENIX Annual Technical Conference (USENIX’01). 105--118.
[17]
Li, Z., Chen, Z., Srinivasan, S. M., and Zhou, Y. 2004. C-Miner: Mining block correlations in storage systems. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST’04). 173--186.
[18]
Liang, S., Jiang, S., and Zhang, X. 2007. STEP: Sequentiality and thrashing detection based prefetching to improve performance of networked storage servers. In Proceedings of 27th IEEE International Conference on Distributed Computing Systems (ICDCS’07).
[19]
LXR. 2013. Linux cross-reference. http://lxr.linux.no/.
[20]
Mckusick, M. K., Joy, W. N., Leffler, S. J., and Fabry, R. S. 1984. A fast file system for unix. ACM Trans. Comput. Syst. 2, 3, 181--197.
[21]
Mowry, T. C., Demke, A. K., and Krieger, O. 1996. Automatic compiler-inserted i/o prefetching for out-of-core applications. In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI’96).
[22]
MPI-IO. 2013. MPI-2: Extensions to the message-passing interface. http://www.mpi-forum.org/docs/mpi-20-html/mpi2-report.html.
[23]
Pai, R., Pulavarty, B., and Cao, M. 2004. Linux 2.6 performance improvement through readahead optimization. In Proceedings of the Linux Symposium.
[24]
Papathanasiou, A. E. and Scott, M. L. 2005. Aggressive prefetching: An idea whose time has come. In Proceedings of the 10th Workshop on Hot Topics in Operating Systems.
[25]
Patterson, R. H., Gibson, G. A., Ginting, E., Stodolsky, D., and Zelenka, J. 1995. Informed prefetching and caching. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP’95). 79--95.
[26]
Schindler, J. and Ganger, G. R. 2000. Automated disk drive characterization. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’00). 112--113.
[27]
Schindler, J., Griffin, J. L., Lumb, C. R., and Ganger, G. R. 2002. Track-aligned extents: Matching access patterns to disk drive characteristics. In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST’02).
[28]
Schlosser, S. W., Schindler, J., Papadomanolakis, S., Shao, M., Ailamaki, A., Faloutsos, C., and Ganger, G. R. 2005. On multidimensional data and modern disks. In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST’05).
[29]
Schmuck, F. and Haskin, R. 2002. GPFS: A shared-disk file system for large computing clusters. In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST’02).
[30]
Smith, A. J. 1978. Sequentiality and prefetching in database systems. ACM Trans. Database Syst. 3, 3, 223--247.
[31]
Tomkins, A., Patterson, R. H., and Gibson, G. 1997. Informed multi-process prefetching and caching. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’97). 100--114.
[32]
Vogels, W. 1999. File system usage in windows nt 4.0. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP’99). 93--109.
[33]
WebStone. 2013. WebStone --- The benchmark for web servers. http://www.mindcraft.com/benchmarks/webstone/.
[34]
Xu, Y. and Jiang, S. 2011. A scheduling framework that makes any disk schedulers non-work-conserving solely based on request characteristics. In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST’11).
[35]
Zhang, X., Davis, K., and Jiang, S. 2010. IOrchestrator: Improving the performance of multi-node i/o systems via inter-server coordination. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’10). 1--11.

Cited By

View all
  • (2025)AC-Cache: A Memory-Efficient Caching System for Small Objects via Exploiting Access CorrelationsProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710856(142-155)Online publication date: 28-Feb-2025
  • (2025)An In-depth Analysis and Further Extensions of Pyramid CodesITM Web of Conferences10.1051/itmconf/2025730302273(03022)Online publication date: 17-Feb-2025
  • (2023)HRFP: Highly Relevant Frequent Patterns-Based Prefetching and Caching Algorithms for Distributed File SystemsElectronics10.3390/electronics1205118312:5(1183)Online publication date: 1-Mar-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 9, Issue 3
August 2013
97 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/2501620
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2013
Accepted: 01 April 2013
Revised: 01 April 2013
Received: 01 September 2012
Published in TOS Volume 9, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Prefetching
  2. buffer cache
  3. hard disk
  4. spatial locality

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)AC-Cache: A Memory-Efficient Caching System for Small Objects via Exploiting Access CorrelationsProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710856(142-155)Online publication date: 28-Feb-2025
  • (2025)An In-depth Analysis and Further Extensions of Pyramid CodesITM Web of Conferences10.1051/itmconf/2025730302273(03022)Online publication date: 17-Feb-2025
  • (2023)HRFP: Highly Relevant Frequent Patterns-Based Prefetching and Caching Algorithms for Distributed File SystemsElectronics10.3390/electronics1205118312:5(1183)Online publication date: 1-Mar-2023
  • (2023)Spidermine: Low Overhead User-Level PrefetchingProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing10.1145/3555776.3577754(1332-1341)Online publication date: 7-Jun-2023
  • (2023)Application and user-specific data prefetching and parallel read algorithms for distributed file systemsCluster Computing10.1007/s10586-023-04160-127:3(3593-3613)Online publication date: 28-Oct-2023
  • (2022)Pattern-Based Prefetching with Adaptive Cache Management Inside of Solid-State DrivesACM Transactions on Storage10.1145/347439318:1(1-25)Online publication date: 29-Jan-2022
  • (2022)BibliographyStorage Systems10.1016/B978-0-32-390796-5.00023-1(641-693)Online publication date: 2022
  • (2022)Disk drive data placement and schedulingStorage Systems10.1016/B978-0-32-390796-5.00012-7(197-222)Online publication date: 2022
  • (2021)Highly Concurrent Latency-tolerant Register Files for GPUsACM Transactions on Computer Systems10.1145/341997337:1-4(1-36)Online publication date: 4-Jan-2021
  • (2021)A self-tuning client-side metadata prefetching scheme for wide area network file systemsScience China Information Sciences10.1007/s11432-019-2833-165:3Online publication date: 22-Feb-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media