skip to main content
10.1145/3183713.3183740acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections

AHEAD: Adaptable Data Hardening for On-the-Fly Hardware Error Detection during Database Query Processing

Published:27 May 2018Publication History

ABSTRACT

We have already known for a long time that hardware components are not perfect and soft errors in terms of single bit flips happen all the time. Up to now, these single bit flips are mainly addressed in hardware using general-purpose protection techniques. However, recent studies have shown that all future hardware components become less and less reliable in total and multi-bit flips are occurring regularly rather than exceptionally. Additionally, hardware aging effects will lead to error models that change during run-time. Scaling hardware-based protection techniques to cover changing multi-bit flips is possible, but this introduces large performance, chip area, and power overheads, which will become non-affordable in the future. To tackle that, an emerging research direction is employing protection techniques in higher software layers like compilers or applications. The available knowledge at these layers can be efficiently used to specialize and adapt protection techniques. Thus, we propose a novel adaptable and on-the-fly hardware error detection approach called AHEAD for database systems in this paper. AHEAD provides configurable error detection in an end-to-end fashion and reduces the overhead (storage and computation) compared to other techniques at this level. Our approach uses an arithmetic error coding technique which allows query processing to completely work on hardened data on the one hand. On the other hand, this enables on-the-fly detection during query processing of (i) errors that modify data stored in memory or transferred on an interconnect and (ii) errors induced during computations. Our exhaustive evaluation clearly shows the benefits of our AHEAD approach.

References

  1. Daniel J. Abadi, Samuel Madden, and Miguel Ferreira. "Integrating compres- sion and execution in column-oriented database systems". In: SIGMOD . 2006, pp. 671--682. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Daniel Abadi et al. "The Beckman report on database research". In: Commun. ACM 59.2 (2016), pp. 92--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Daniel Abadi et al. "The Design and Implementation of Modern Column- Oriented Database Systems". In: Foundations and Trends in Databases 5.3 (2013), pp. 197--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Algirdas Avizienis. "Arithmetic Error Codes: Cost and Effectiveness Studies for Application in Digital System Design". In: IEEE Trans. Computers 20.11 (1971), pp. 1322--1331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Algirdas Avizienis. "The N-Version Approach to Fault-Tolerant Software". In: IEEE Trans. Software Eng. 11.12 (1985), pp. 1491--1501. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Carsten Binnig, Stefan Hildenbrand, and Franz Färber. "Dictionary-based order- preserving string compression for main memory column stores". In: SIGMOD . 2009, pp. 283--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Matthias Böhm, Wolfgang Lehner, and Christof Fetzer. "Resiliency-Aware Data Management". In: PVLDB 4.12 (2011), pp. 1462--1465.Google ScholarGoogle Scholar
  8. Peter A. Boncz and Martin L. Kersten. "MIL Primitives for Querying a Frag- mented World". In: VLDB J. 8.2 (1999), pp. 101--119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Peter A. Boncz, Martin L. Kersten, and Stefan Manegold. "Breaking the memory wall in MonetDB". In: Commun. ACM 51.12 (2008), pp. 77--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Peter Alexander Boncz. "Monet; a next-Generation DBMS Kernel For Query- Intensive Applications". PhD thesis. University of Amsterdam, 2002.Google ScholarGoogle Scholar
  11. Shekhar Y. Borkar. "Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation". In: IEEE Micro 25.6 (2005), pp. 10--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Shekhar Borkar and Andrew A. Chien. "The future of microprocessors". In: Commun. ACM 54.5 (2011), pp. 67--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sebastian Breß, Henning Funke, and Jens Teubner. "Robust Query Processing in Co-Processor-accelerated Databases". In: SIGMOD . 2016, pp. 1891--1906. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. George P. Copeland and Setrag Khoshafian. "A Decomposition Storage Model". In: SIGMOD . 1985, pp. 268--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Patrick Damme et al. "Lightweight Data Compression Algorithms: An Experi- mental Survey (Experiments and Analyses)". In: EDBT . 2017, pp. 72--83.Google ScholarGoogle Scholar
  16. Timothy J Dell. "A white paper on the benefits of chipkill-correct ECC for PC server main memory". In: IBM Microelectronics Division 11 (1997).Google ScholarGoogle Scholar
  17. Cristian Diaconu et al. "Hekaton: SQL server's memory-optimized OLTP en- gine". In: SIGMOD . 2013, pp. 1243--1254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jaeyoung Do et al. "Query processing on smart SSDs: opportunities and chal- lenges". In: SIGMOD . 2013, pp. 1221--1230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Dan Ernst et al. "Razor: circuit-level correction of timing errors for low-power operation". In: IEEE Micro 24.6 (2004), pp. 10--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hadi Esmaeilzadeh et al. "Dark Silicon and the End of Multicore Scaling". In: IEEE Micro 32.3 (2012), pp. 122--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ziqiang Feng et al. "ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout". In: SIGMOD . 2015, pp. 31--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P Forin. "Vital Coded Microprocessor: Principles and Application for Various Transit Systems". In: IFAC-GCCT (1989).Google ScholarGoogle Scholar
  23. Free Software Foundation. The GNU Multiple Precision Arithmetic Library . https://gmplib.org/. Nov. 2016.Google ScholarGoogle Scholar
  24. Brian Gladman et al. MPIR: Multiple Precision Integers and Rationals . http : //mpir.org/. Nov. 2016.Google ScholarGoogle Scholar
  25. Olga Goloubeva et al. Software-implemented hardware fault tolerance . Springer Science &Business Media, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Richard W Hamming. "Error detecting and error correcting codes". In: Bell System technical journal 29.2 (1950).Google ScholarGoogle Scholar
  27. Jörg Henkel. "Emerging Memory Technologies". In: IEEE Design &Test 34.3 (2017), pp. 4--5.Google ScholarGoogle ScholarCross RefCross Ref
  28. Jörg Henkel et al. "Reliable on-chip systems in the nano-era: lessons learnt and future trends". In: DAC . 2013, 99:1--99:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Martin Hoffmann et al. "A Practitioner's Guide to Software-Based Soft-Error Mitigation Using AN-Codes". In: HASE . 2014, pp. 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Andy A. Hwang, Ioan A. Stefanovici, and Bianca Schroeder. "Cosmic rays don't strike twice: understanding the nature of DRAM errors and the implications for system design". In: ASPLOS . 2012, pp. 111--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Eishi Ibe et al. "Impact of scaling on neutron-induced soft error in SRAMs from a 250 nm to a 22 nm design rule". In: IEEE Transactions on Electron Devices 57.7 (2010), pp. 1527--1538.Google ScholarGoogle ScholarCross RefCross Ref
  32. Stratos Idreos et al. "MonetDB: Two Decades of Research in Column-oriented Database Architectures". In: IEEE Data Eng. Bull. 35.1 (2012), pp. 40--45.Google ScholarGoogle Scholar
  33. K Itoh et al. "A single 5V 64K dynamic RAM". In: ISSCC . Vol. 23. 1980, pp. 228-- 229.Google ScholarGoogle Scholar
  34. Lei Jiang, Youtao Zhang, and Jun Yang. "Mitigating Write Disturbance in Super- Dense Phase Change Memories". In: DSN . 2014, pp. 216--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Tomas Karnagel, Dirk Habich, and Wolfgang Lehner. "Adaptive Work Place- ment for Query Processing on Heterogeneous Computing Resources". In: PVLDB 10.7 (2017), pp. 733--744. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. D. Kaur and D. Wedding. "Reliability of Hamming code transmission versus error probability on message bits". In: Microelectronics Reliability 34.7 (1994).Google ScholarGoogle ScholarCross RefCross Ref
  37. Samira Manabi Khan, Donghyuk Lee, and Onur Mutlu. "PARBOR: An Efficient System-Level Technique to Detect Data-Dependent Failures in DRAM". In: DSN . 2016, pp. 239--250.Google ScholarGoogle Scholar
  38. Samira Khan et al. "The Efficacy of Error Mitigation Techniques for DRAM Re- tention Failures: A Comparative Experimental Study". In: SIGMETRICS Perform. Eval. Rev. 42.1 (June 2014), pp. 519--532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jangwoo Kim et al. "Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding". In: Symposium on Microarchitecture . 2007, pp. 197--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yoongu Kim et al. "Flipping bits in memory without accessing them: An exper- imental study of DRAM disturbance errors". In: ISCA . 2014, pp. 361--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Thomas Kissinger et al. "QPPT: Query Processing on Prefix Trees". In: CIDR . 2013.Google ScholarGoogle Scholar
  42. Masanobu Kohara et al. "Mechanism of electromigration in ceramic packages induced by chip-coating polyimide". In: IEEE Transactions on Components, Hybrids, and Manufacturing Technology 13.4 (1990), pp. 873--878.Google ScholarGoogle ScholarCross RefCross Ref
  43. Till Kolditz et al. "Online bit flip detection for in-memory B-trees on unreliable hardware". In: DaMoN . 2014, 5:1--5:9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Emre Kultursay et al. "Evaluating STT-RAM as an energy-efficient main mem- ory alternative". In: ISPASS . 2013, pp. 256--267.Google ScholarGoogle Scholar
  45. Tirthankar Lahiri, Marie-Anne Neimat, and Steve Folkman. "Oracle TimesTen: An In-Memory Database for Enterprise Applications". In: IEEE Data Eng. Bull. 36.2 (2013), pp. 6--13.Google ScholarGoogle Scholar
  46. Benjamin C. Lee et al. "Architecting phase change memory as a scalable dram alternative". In: ISCA . 2009, pp. 2--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Christiane Lemieux. Monte Carlo and Quasi-Monte Carlo Sampling . Springer, 2009. isbn : 978--1441926760.Google ScholarGoogle Scholar
  48. Feng Li et al. "Accelerating Relational Databases by Leveraging Remote Mem- ory and RDMA". In: SIGMOD . 2016, pp. 355--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yinan Li and Jignesh M. Patel. "BitWeaving: Fast Scans for Main Memory Data Processing". In: SIGMOD . 2013, pp. 289--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Jamie Liu et al. "An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms". In: SIGARCH Comput. Archit. News 41.3 (June 2013), pp. 60--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Sparsh Mittal. "A Survey of Soft-Error Mitigation Techniques for Non-Volatile Memories". In: Computers 6.1 (2017), p. 8.Google ScholarGoogle ScholarCross RefCross Ref
  52. Todd K Moon. "Error correction coding". In: Mathematical Methods and Algo- rithms. Jhon Wiley and Son (2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Wojciech Mula, Nathan Kurz, and Daniel Lemire. "Faster Population Counts using AVX2 Instructions". In: CoRR (2016).Google ScholarGoogle Scholar
  54. Onur Mutlu. "The RowHammer problem and other issues we may face as memory becomes denser". In: DATE . 2017, pp. 1116--1121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Thomas Neumann. The price of correctness . http://databasearchitects.blogspot. de/2015/12/the-price-of-correctness.html. Nov. 2016.Google ScholarGoogle Scholar
  56. Patrick O'Neil et al. "The Star Schema Benchmark and Augmented Fact Table Indexing". In: TPCTC 2009: Performance Evaluation and Benchmarking . Berlin, Heidelberg: Springer, 2009, pp. 237--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Nahmsuk Oh, Philip P Shirvani, and Edward J McCluskey. "Error detection by duplicated instructions in super-scalar processors". In: IEEE Transactions on Reliability 51.1 (2002), pp. 63--75.Google ScholarGoogle ScholarCross RefCross Ref
  58. Ismail Oukid et al. "FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory". In: SIGMOD . 2016, pp. 371--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. William Wesley Peterson and Daniel T Brown. "Cyclic codes for error detec- tion". In: IRE 49.1 (1961), pp. 228--235.Google ScholarGoogle ScholarCross RefCross Ref
  60. Frank M. Pittelli and Hector Garcia-Molina. "Database Processing with Triple Modular Redundancy". In: SRDS . 1986, pp. 95--103.Google ScholarGoogle Scholar
  61. Frank M. Pittelli and Hector Garcia-Molina. "Reliable Scheduling in a TMR Database System". In: ACM Trans. Comput. Syst. 7.1 (1989), pp. 25--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Fred J. Pollack. "New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies". In: Symposium on Microarchitecture . 1999, p. 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Semeen Rehman, Muhammad Shafique, and Jörg Henkel. Reliable Software for Unreliable Hardware - A Cross Layer Perspective . Springer, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Steven K. Reinhardt and Shubhendu S. Mukherjee. "Transient fault detection via simultaneous multithreading". In: ISCA . 2000, pp. 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. George A. Reis et al. "SWIFT: Software Implemented Fault Tolerance". In: CGO . 2005, pp. 243--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Michael C. Ring. MAPM, A Portable Arbitrary Precision Math Library in C . http://www.tc.umn.edu/~ringx004/mapm-main.html. Nov. 2016.Google ScholarGoogle Scholar
  67. Ronald Linn Rivest. The MD5 Message-Digest Algorithm . Nov. 2016. url : https: //tools.ietf.org/html/rfc1321.Google ScholarGoogle Scholar
  68. Jimi Sanchez. "A Review of Star Schema Benchmark". In: CoRR abs/1606.00295 (2016).Google ScholarGoogle Scholar
  69. Ute Schiffel. "Hardware error detection using AN-Codes". PhD thesis. Dresden University of Technology, 2011Google ScholarGoogle Scholar
  70. Muhammad Shafique et al. "Multi-layer software reliability for unreliable hardware". In: it - Information Technology 57.3 (2015), pp. 170--180.Google ScholarGoogle ScholarCross RefCross Ref
  71. Erez Shmueli et al. "Database encryption: an overview of contemporary chal- lenges and design considerations". In: SIGMOD Record 38.3 (2009), pp. 29-- 34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Konstantin Shvachko et al. "The Hadoop Distributed File System". In: MSST . 2010, pp. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Gopalan Sivathanu, Charles P. Wright, and Erez Zadok. "Ensuring Data In- tegrity in Storage: Techniques and Applications". In: StorageSS . 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Michael Spica and T. M. Mak. "Do We Need Anything More Than Single Bit Error Correction (ECC)?" In: MTDT . 2004, pp. 111--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Michael Stonebraker et al. "C-Store: A Column-oriented DBMS". In: VLDB . 2005, pp. 553--564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Stephen Y. H. Su and Edgar DuCasse. "A hardware redundancy reconfigura- tion scheme for tolerating multiple module failures". In: IEEE Transactions on Computers 3.C-29 (1980), pp. 254--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy. "Simultaneous Multi- threading: Maximizing On-Chip Parallelism". In: ISCA . 1995, pp. 392--403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Peter Ulbrich, Martin Hoffmann, and Christian Dietrich. CoRed: Experimental Results . https://www4.cs.fau.de/Research/CoRed/experiments/. July 2017.Google ScholarGoogle Scholar
  79. Peter Ulbrich et al. "Eliminating single points of failure in software-based redundancy". In: EDCC . 2012, pp. 49--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Henry S Warren. Hacker's delight . Pearson Education, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Matthias Werner et al. "Multi-GPU Approximation for Silent Data Corruption of AN Codes". In: Further Improvements in the Boolean Domain . Ed. by Bernd Steinbach. Cambridge Scholars Publishing, 2018. Chap. 2.3, pp. 136--155.Google ScholarGoogle Scholar
  82. Thomas Willhalm et al. "SIMD-scan: Ultra Fast In-memory Table Scan Using On-chip Vector Processing Units". In: Proc. VLDB Endow. (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Thomas Willhalm et al. "Vectorizing database column scans with complex predicates". In: ADMS . 2013, pp. 1--12.Google ScholarGoogle Scholar
  84. Ian H. Witten, Radford M. Neal, and John G. Cleary. "Arithmetic Coding for Data Compression". In: Commun. ACM 30.6 (1987), pp. 520--540. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. J. Wolf, A. Michelson, and A. Levesque. "On the Probability of Undetected Error for Linear Block Codes". In: IEEE Transactions on Communications 30.2 (1982).Google ScholarGoogle ScholarCross RefCross Ref
  86. H.-S. Philip Wong et al. "Metal-Oxide RRAM". In: Proceedings of the IEEE 100.6 (2012), pp. 1951--1970.Google ScholarGoogle ScholarCross RefCross Ref
  87. Marcin Zukowski, Mark van de Wiel, and Peter A. Boncz. "Vectorwise: A Vectorized Analytical DBMS". In: ICDE . 2012, pp. 1349--1350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Marcin Zukowski et al. "Super-Scalar RAM-CPU Cache Compression". In: ICDE . 2006, p. 59 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. AHEAD: Adaptable Data Hardening for On-the-Fly Hardware Error Detection during Database Query Processing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
          May 2018
          1874 pages
          ISBN:9781450347037
          DOI:10.1145/3183713

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 May 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SIGMOD '18 Paper Acceptance Rate90of461submissions,20%Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader