ABSTRACT
Many techniques have been proposed to scale web applications. However, the data interdependencies between the database queries and transactions issued by the applications limit their efficiency. We claim that major scalability improvements can be gained by restructuring the web application data into multiple independent data services with exclusive access to their private data store. While this restructuring does not provide performance gains by itself, the implied simplification of each database workload allows a much more efficient use of classical techniques. We illustrate the data denormalization process on three benchmark applications: TPC-W, RUBiS and RUBBoS. We deploy the resulting service-oriented implementation of TPC-W across an 85-node cluster and show that restructuring its data can provide at least an order of magnitude improvement in the maximum sustainable throughput compared to master-slave database replication, while preserving strong consistency and transactional properties.
- B. Abrahao, V. Almeida, J. Almeida, A. Zhang, D. Beyer, and F. Safai. Self-adaptive SLA-driven capacity management for internet services. In Proc. NOMS, Apr. 2006.Google ScholarCross Ref
- K. Amiri, S. Park, R. Tewari, and S. Padmanabhan. DBProxy: A dynamic data cache for Web applications. In Proc. ICDE, Mar. 2003.Google ScholarCross Ref
- C. Amza, E. Cecchet, A. Chanda, A. Cox, S. Elnikety, R. Gil, J. Marguerite, K. Rajamani, and W. Zwaenepoel. Specification and implementation of dynamic web site benchmarks. In Proc. Intl. Workshop on Workload Characterization, Nov. 2002.Google ScholarCross Ref
- C. Bornhövd, M. Altinel, C. Mohan, H. Pirahesh, and B. Reinwald. Adaptive database caching with DBCache. Data Engineering, 27(2):11--18, June 2004.Google Scholar
- E. Cecchet. C-JDBC: a middleware framework for database clustering. Data Engineering, 27(2):19--26, June 2004.Google Scholar
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. In Proc. OSDI, Nov. 2006. Google ScholarDigital Library
- I. Cunha, J. Almeida, V. Almeida, and M. dos Santos. Self-adaptive capacity management for multi-tier virtualized environments. In Proc. Intl. Symposium on Integrated Network Management, May 2007.Google ScholarCross Ref
- DAS3: The Distributed ASCI Supercomputer 3. http://www.cs.vu.nl/das3/.Google Scholar
- A. Davis, J. Parikh, and W. E. Weihl. Edge computing: Extending enterprise applications to the edge of the internet. In Proc. WWW, May 2004. Google ScholarDigital Library
- G. DeCandia, D. Hastorum, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proc. SOSP, Oct. 2007. Google ScholarDigital Library
- L. Gao, M. Dahlin, A. Nayate, J. Zheng, and A. Iyengar. Application specific data replication for edge services. In Proc. WWW, May 2003. Google ScholarDigital Library
- J. Gray. A conversation with Werner Vogels. ACM Queue, 4(4):14--22, May 2006. Google ScholarDigital Library
- S. D. Gribble, E. A. Brewer, J. M. Hellerstein, and D. Culler. Scalable, distributed data structures for internet service construction. In Proc. OSDI, 2000. Google ScholarDigital Library
- T. Groothuyse, S. Sivasubramanian, and G. Pierre. GlobeTP: Template-based database replication for scalable web applications. In Proc. WWW, May 2007. Google ScholarDigital Library
- Y. Huang and J. Chen. Fragment allocation in distributed database design. Information Science and Engineering, 17(3):491--506, May 2001.Google Scholar
- Java TPC-W implementation distribution. http://www.ece.wisc.edu/pharm/tpcw.shtml.Google Scholar
- L. Kazerouni and K. Karlapalem. Stepwise redesign of distributed relational databases. Technical Report HKUST-CS97-12, Hong Kong Univ. of Science and Technology, Dept. of Computer Science, Sept. 1997.Google Scholar
- B. Kemme and G. Alonso. Don't be lazy, be consistent: Postgres-R, a new way to implement database replication. In Proc. VLDB, Sept. 2000. Google ScholarDigital Library
- S. Navathe, K. Karlapalem, and M. Ra. A mixed fragmentation methodology for initial distributed database design. Computer and Software Engineering, 3(4), 1995.Google Scholar
- S. Navathe and M. Ra. Vertical partitioning for database design: a graphical algorithm. SIGMOD Records, 18(2):440--450, 1989. Google ScholarDigital Library
- C. Olston, A. Manjhi, C. Garrod, A. Ailamaki, B. Maggs, and T. Mowry. A scalability service for dynamic web applications. In Proc. Conf. on Innovative Data Systems Research, Jan. 2005.Google Scholar
- M. T. Özsu and P. Valduriez. Principles of distributed database systems. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 2nd edition, Feb. 1999. Google ScholarDigital Library
- C. Plattner and G. Alonso. Ganymed: Scalable replication for transactional web applications. In Proc. Middleware, Oct. 2004. Google ScholarDigital Library
- M. Rabinovich, Z. Xiao, and A. Agarwal. Computing on the edge: A platform for replicating internet applications. In Proc. Intl. Workshop on Web Content Caching and Distribution, Sept. 2003.Google Scholar
- M. Ronstrom and L. Thalmann. MySQL cluster architecture overview. MySQL Technical White Paper, Apr. 2004.Google Scholar
- RUBBoS: Bulletin board system benchmark. http://jmob.objectweb.org/rubbos.html.Google Scholar
- G. L. Sanders and S. K. Shin. Denormalization effects on performance of RDBMS. In Proc. HICSS, Jan. 2001. Google ScholarDigital Library
- S. K. Shin and G. L. Sanders. Denormalization strategies for data retrieval from data warehouses. Decision Support Systems, 42(1):267--282, Oct. 2006. Google ScholarDigital Library
- S. Sivasubramanian, G. Pierre, and M. van Steen. GlobeDB: Autonomic data replication for web applications. In Proc. WWW, May 2005. Google ScholarDigital Library
- S. Sivasubramanian, G. Pierre, M. van Steen, and G. Alonso. GlobeCBC: Content-blind result caching for dynamic web applications. Technical Report IR-CS-022, Vrije Universiteit, Amsterdam, The Netherlands, June 2006.Google Scholar
- S. Sivasubramanian, G. Pierre, M. van Steen, and G. Alonso. Analysis of caching and replication strategies for web applications. IEEE Internet Computing, 11(1):60--66, January-February 2007. Google ScholarDigital Library
- W. D. Smith. TPC-W: Benchmarking an ecommerce solution. White paper, Transaction Processing Performance Council.Google Scholar
- N. Tolia and M. Satyanarayanan. Consistency-preserving caching of dynamic database content. In Proc. WWW, Nov. 2006. Google ScholarDigital Library
- TPC-W frequently asked questions, question 2.10: "Why was the concept of atomic set of operations added and what are its requirements?", Aug. 1999.Google Scholar
- B. Urgaonkar, P. Shenoy, A. Chandra, and P. Goyal. Agile, dynamic capacity provisioning for multi-tier internet applications. In Proc. ICAC, June 2005. Google ScholarDigital Library
Index Terms
Service-oriented data denormalization for scalable web applications
Recommendations
Towards NoSQL-based Data Warehouse Solutions
Data warehousing is a traditional domain of relational databases, and there are two main reasons for that: (1) data warehouses mostly are used in enterprises with large-scale data sets created in different legacy systems with relational data storages, (...
Scalable Join Queries in Cloud Data Stores
CCGRID '12: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)Cloud data stores provide scalability and high availability properties for Web applications, but do not support complex queries such as joins. Web application developers must therefore design their programs according to the peculiarities of No SQL data ...
Using Ant Colony System to Consolidate Multiple Web Applications in a Cloud Environment
PDP '14: Proceedings of the 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based ProcessingInfrastructure as a Service (IaaS) clouds provide virtual machines (VMs) under a pay-per-use business model, which can be used to create a dynamically scalable cluster of servers to deploy one or more web applications. In contrast to the traditional ...
Comments