ACM Home Page
Please provide us with feedback. Feedback
Easing the management of data-parallel systems via adaptation
Full text PdfPdf (147 KB)
Source ACM SIGOPS European Workshop archive
Proceedings of the 9th workshop on ACM SIGOPS European workshop: beyond the PC: new challenges for the operating system table of contents
Kolding, Denmark
SESSION: Session 7: OS architecture II table of contents
Pages: 103 - 108  
Year of Publication: 2000
ISBN:1-23456-789-0
Authors
David Petrou  Carnegie Mellon University
Khalil Amiri  Carnegie Mellon University
Gregory R. Ganger  Carnegie Mellon University
Garth A. Gibson  Carnegie Mellon University
Sponsor
SIGOPS: ACM Special Interest Group on Operating Systems
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 13,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/566726.566750
What is a DOI?

ABSTRACT

In recent years we have seen an enormous growth in the size and prevalence of data processing workloads [Fayyad 1998, Gray 1997]. The picture that is becoming increasingly common is depicted in Figure 1. In it, organizations or resourceful individuals provide services via a set of loosely-coupled workstation nodes. The service is usually some form of data-mining like searching, filtering, or image recognition. Clients, which could be machines running web browsers, not only initiate requests, but also partake in the processing, with the goal of reducing the request turnaround. That is, when the servers are overloaded, clients with spare cycles take some of the computational burden. Naturally, many aspects of such a system cannot be determined at design time. E.g., exactly how much work a client should do depends on the computational resources available at the client and server cluster, the network bandwidth unused between them, and the workload demand. This position paper is interested in this and other aspects that must be divined at run-time to provide high performance and availability in data-parallel systems.What makes system tuning especially hard is that it's not possible to find the right knob-settings once and for all. A system upgrade or component failure may change the appropriate degree of data-parallelism. Changes in usable bandwidth may ask for a different partitioning of code among the client and server cluster. Moreover, an application may go through distinct phases during its execution. We should checkpoint the application for fault-tolerance less often during those phases in which checkpointing takes longer. Finally, the system needs to effectively allocate resources to concurrent applications, which can start at any time and which benefit differently from having these resources. In summary, we argue that in the future a significant fraction of computing will happen on architectures like Figure 1, and that, due to the architectures' inherent complexity, high availability and fast turnaround can only be realized by dynamically tuning a number of system parameters.Our position is that this tuning should be provided automatically by the system. The contrasting, application-specific view, contends that, to the extent possible, policies should be made by applications since they can make more informed optimizations. However, this requires a great deal of sophistication from the programmer. Further, it requires programmer time, one of the most scarce resources in systems building today.Toward our goal, we contribute a framework that is sufficiently rich to express a variety of interesting data-parallel applications, but which is also restricted enough so that the system can tune itself. These applications are built atop the ABACUS migration system, whose object placement algorithms are extended to reason about how many nodes should participate in a data-parallel computation, how to split up application objects among a client and server cluster, how often program state should be checkpointed, and the interaction (sometimes conflicting) between these questions. By automatically determining a number of critical parameters at runtime, we are minimizing the management costs which have in recent years given system administrators the howling fantods [Satyanarayanan 1999].


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
{Amiri et al. 2000} Amiri, K., Petrou, D., Ganger, G. R., and Gibson, G. A. Dynamic function placement for data-intensive cluster computing. In Proceedings of the USENIX 2000 Annual Technical Conference, San Diego, CA, June 2000. Available at http://www.cs.cmu.edu/~dpetrou/research.html.
 
3
{Anderson 1992} Anderson, T. E. The case for application-specific operating systems. In Proceedings of the Third Workshop on Workstation Operating Systems, Key Biscayne, FL, April 1992.
4
 
5
{Carriero et al. 1993} Carriero, N., Gelernter, D., Kaminsky, D., and Westbrook, J. Adaptive parallelism with Piranha. Technical Report 954, Yale University Department of Computer Science, February 1993.
 
6
{Fayyad 1998} Fayyad, U. Taming the giants and the monsters: Mining large databases for nuggets of knowledge. Database Programming and Design, 11(3), March 1998.
7
 
8
{Gray 1997} Gray, J. Building petabyte databases and storage metrics, March 5 1997. Talk given at Carnegie Mellon University, available from http://research.microsoft.com/~gray/.
9
10
 
11
12
 
13
 
14
 
15
{Satyanarayanan 1999} Satyanarayanan, M. Digest of Proceedings, Seventh IEEE Workshop on Hot Topics in Operating Systems, March 1999.
 
16
{Schneider 1983} Schneider, F. B. Fail-stop processors. In Proceedings of the IEEE Spring COMPCON, pp. 66-70, San Francisco, CA, March 1983.
 
17
18
19
Collaborative Colleagues:
David Petrou: colleagues
Khalil Amiri: colleagues
Gregory R. Ganger: colleagues
Garth A. Gibson: colleagues

Peer to Peer - Readers of this Article have also read: