skip to main content
10.1145/1367497.1367535acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Service-oriented data denormalization for scalable web applications

Published:21 April 2008Publication History

ABSTRACT

Many techniques have been proposed to scale web applications. However, the data interdependencies between the database queries and transactions issued by the applications limit their efficiency. We claim that major scalability improvements can be gained by restructuring the web application data into multiple independent data services with exclusive access to their private data store. While this restructuring does not provide performance gains by itself, the implied simplification of each database workload allows a much more efficient use of classical techniques. We illustrate the data denormalization process on three benchmark applications: TPC-W, RUBiS and RUBBoS. We deploy the resulting service-oriented implementation of TPC-W across an 85-node cluster and show that restructuring its data can provide at least an order of magnitude improvement in the maximum sustainable throughput compared to master-slave database replication, while preserving strong consistency and transactional properties.

References

  1. B. Abrahao, V. Almeida, J. Almeida, A. Zhang, D. Beyer, and F. Safai. Self-adaptive SLA-driven capacity management for internet services. In Proc. NOMS, Apr. 2006.Google ScholarGoogle ScholarCross RefCross Ref
  2. K. Amiri, S. Park, R. Tewari, and S. Padmanabhan. DBProxy: A dynamic data cache for Web applications. In Proc. ICDE, Mar. 2003.Google ScholarGoogle ScholarCross RefCross Ref
  3. C. Amza, E. Cecchet, A. Chanda, A. Cox, S. Elnikety, R. Gil, J. Marguerite, K. Rajamani, and W. Zwaenepoel. Specification and implementation of dynamic web site benchmarks. In Proc. Intl. Workshop on Workload Characterization, Nov. 2002.Google ScholarGoogle ScholarCross RefCross Ref
  4. C. Bornhövd, M. Altinel, C. Mohan, H. Pirahesh, and B. Reinwald. Adaptive database caching with DBCache. Data Engineering, 27(2):11--18, June 2004.Google ScholarGoogle Scholar
  5. E. Cecchet. C-JDBC: a middleware framework for database clustering. Data Engineering, 27(2):19--26, June 2004.Google ScholarGoogle Scholar
  6. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. In Proc. OSDI, Nov. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. I. Cunha, J. Almeida, V. Almeida, and M. dos Santos. Self-adaptive capacity management for multi-tier virtualized environments. In Proc. Intl. Symposium on Integrated Network Management, May 2007.Google ScholarGoogle ScholarCross RefCross Ref
  8. DAS3: The Distributed ASCI Supercomputer 3. http://www.cs.vu.nl/das3/.Google ScholarGoogle Scholar
  9. A. Davis, J. Parikh, and W. E. Weihl. Edge computing: Extending enterprise applications to the edge of the internet. In Proc. WWW, May 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. DeCandia, D. Hastorum, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proc. SOSP, Oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Gao, M. Dahlin, A. Nayate, J. Zheng, and A. Iyengar. Application specific data replication for edge services. In Proc. WWW, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Gray. A conversation with Werner Vogels. ACM Queue, 4(4):14--22, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. D. Gribble, E. A. Brewer, J. M. Hellerstein, and D. Culler. Scalable, distributed data structures for internet service construction. In Proc. OSDI, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Groothuyse, S. Sivasubramanian, and G. Pierre. GlobeTP: Template-based database replication for scalable web applications. In Proc. WWW, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Huang and J. Chen. Fragment allocation in distributed database design. Information Science and Engineering, 17(3):491--506, May 2001.Google ScholarGoogle Scholar
  16. Java TPC-W implementation distribution. http://www.ece.wisc.edu/pharm/tpcw.shtml.Google ScholarGoogle Scholar
  17. L. Kazerouni and K. Karlapalem. Stepwise redesign of distributed relational databases. Technical Report HKUST-CS97-12, Hong Kong Univ. of Science and Technology, Dept. of Computer Science, Sept. 1997.Google ScholarGoogle Scholar
  18. B. Kemme and G. Alonso. Don't be lazy, be consistent: Postgres-R, a new way to implement database replication. In Proc. VLDB, Sept. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Navathe, K. Karlapalem, and M. Ra. A mixed fragmentation methodology for initial distributed database design. Computer and Software Engineering, 3(4), 1995.Google ScholarGoogle Scholar
  20. S. Navathe and M. Ra. Vertical partitioning for database design: a graphical algorithm. SIGMOD Records, 18(2):440--450, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Olston, A. Manjhi, C. Garrod, A. Ailamaki, B. Maggs, and T. Mowry. A scalability service for dynamic web applications. In Proc. Conf. on Innovative Data Systems Research, Jan. 2005.Google ScholarGoogle Scholar
  22. M. T. Özsu and P. Valduriez. Principles of distributed database systems. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 2nd edition, Feb. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Plattner and G. Alonso. Ganymed: Scalable replication for transactional web applications. In Proc. Middleware, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Rabinovich, Z. Xiao, and A. Agarwal. Computing on the edge: A platform for replicating internet applications. In Proc. Intl. Workshop on Web Content Caching and Distribution, Sept. 2003.Google ScholarGoogle Scholar
  25. M. Ronstrom and L. Thalmann. MySQL cluster architecture overview. MySQL Technical White Paper, Apr. 2004.Google ScholarGoogle Scholar
  26. RUBBoS: Bulletin board system benchmark. http://jmob.objectweb.org/rubbos.html.Google ScholarGoogle Scholar
  27. G. L. Sanders and S. K. Shin. Denormalization effects on performance of RDBMS. In Proc. HICSS, Jan. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. K. Shin and G. L. Sanders. Denormalization strategies for data retrieval from data warehouses. Decision Support Systems, 42(1):267--282, Oct. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Sivasubramanian, G. Pierre, and M. van Steen. GlobeDB: Autonomic data replication for web applications. In Proc. WWW, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Sivasubramanian, G. Pierre, M. van Steen, and G. Alonso. GlobeCBC: Content-blind result caching for dynamic web applications. Technical Report IR-CS-022, Vrije Universiteit, Amsterdam, The Netherlands, June 2006.Google ScholarGoogle Scholar
  31. S. Sivasubramanian, G. Pierre, M. van Steen, and G. Alonso. Analysis of caching and replication strategies for web applications. IEEE Internet Computing, 11(1):60--66, January-February 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. W. D. Smith. TPC-W: Benchmarking an ecommerce solution. White paper, Transaction Processing Performance Council.Google ScholarGoogle Scholar
  33. N. Tolia and M. Satyanarayanan. Consistency-preserving caching of dynamic database content. In Proc. WWW, Nov. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. TPC-W frequently asked questions, question 2.10: "Why was the concept of atomic set of operations added and what are its requirements?", Aug. 1999.Google ScholarGoogle Scholar
  35. B. Urgaonkar, P. Shenoy, A. Chandra, and P. Goyal. Agile, dynamic capacity provisioning for multi-tier internet applications. In Proc. ICAC, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Service-oriented data denormalization for scalable web applications

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              WWW '08: Proceedings of the 17th international conference on World Wide Web
              April 2008
              1326 pages
              ISBN:9781605580852
              DOI:10.1145/1367497

              Copyright © 2008 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 21 April 2008

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate1,899of8,196submissions,23%

              Upcoming Conference

              WWW '24
              The ACM Web Conference 2024
              May 13 - 17, 2024
              Singapore , Singapore

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader