Abstract
Perhaps surprisingly, no practical performance models exist for popular (and complex) client applications such as Adobe's Designer suite, Microsoft's Office suite and Visual Studio, Mozilla, Halo 3, etc. There is currently no tool that automatically answers program developers', IT administrators' and end-users' simple what-if questions like "what happens to the performance of my favorite application X if I upgrade from Windows Vista to Windows 7?". This paper describes directions we are taking for constructing practical, versatile performance models to address this problem.
The directions we have taken have two paths. The first path involves instrumenting applications better to export their state and associated metrics. This application-specific monitoring is always on and interesting data is collected from real, "in-the-wild" deployments. The second path involves statistical modeling techniques. The models we are experimenting with require no modifications to the OS or applications beyond the above instrumentation, and no explicit a priori model on how an OS or application should behave. We are in the process of learning from models we have constructed for several Microsoft products, including the Office suite, Visual Studio and Media Player. This paper presents preliminary findings from a large user deployment (several hundred thousand user sessions) of these applications that show the coverage and limitations of such models.
Early indications from this work point towards future modeling strategies based on large amounts of data collected in the field. We present our thoughts on what this could imply for the SIGMETRICS community.
- M. K. Aguilera, J. C. Mogul, J. L. Wiener, P. Reynolds, and A. Muthitacharoen. Performance debugging for distributed systems of black boxes. In ACM Symposium on Operating System Principles, pages 74--89, 2003. Google ScholarDigital Library
- Apple. Optimize with Shark: Big payoff, small effort. http://developer.apple.com/tools/.Google Scholar
- S. Basu, J. Dunagan, and G. Smith. Why did my PC suddenly slow down? In SYSML'07: Proceedings of the 2nd USENIX workshop on tackling computer systems problems with machine learning techniques, pages 1--6, 2007. Google ScholarDigital Library
- S. Bhatia, A. Kumar, M. E. Fiuczynski, and L. L. Peterson. Lightweight, high-resolution monitoring for troubleshooting production systems. In Proceedings of the 8th conference on Symposium on Opearting Systems Design & Implementation, pages 103--116, 2008. Google ScholarDigital Library
- L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. CART. Chapman and Hall/CRC, 1998.Google Scholar
- M. Y. Chen, A. Accardi, E. Kiciman, D. Patterson, A. Fox, and E. Brewer. Path-based failure and evolution management. In Symposium on Networked Systems Design and Implementation, pages 309--322, 2004. Google ScholarDigital Library
- I. Cohen, J. S. Chase, M. Goldszmidt, T. Kelly, and J. Symons. Correlating instrumentation data to system states: a building block for automated diagnosis and control. In Symposium on Operating Systems Design and Implementation, pages 231--244, 2004. Google ScholarDigital Library
- I. Cohen, S. Zhang, M. Goldszmidt, J. Symons, T. Kelly, and A. Fox. Capturing, indexing clustering and retreiving system history. In ACM Symposium on Operating Systems Principles, 2005. Google ScholarDigital Library
- K. Glerum, K. Kinshumann, S. Greenberg, G. Aul, V. Orgovan, G. Nichols, D. Grant, G. Loihle, and G. Hunt. Debugging in the (very) large: ten years of implementation and experience. In Proceedings of the Symposium on Operating systems principles, pages 103--116, 2009. Google ScholarDigital Library
- J. Ha, C. J. Rossbach, J. V. Davis, I. Roy, H. E. Ramadan, D. E. Porter, D. L. Chen, and E. Witchel. Improved error reporting for software that uses black-box components. SIGPLAN Not., 42(6):101--111, 2007. Google ScholarDigital Library
- Q. He, C. Dovrolis, and M. Ammar. On the predictability of large transfer TCP throughput. In ACM SIGCOMM Conference, pages 145--156, 2005. Google ScholarDigital Library
- R. Jain. The art of computer systems performance analysis. John Wiley & Sons, 1991.Google Scholar
- E. Kiciman and B. Livshits. AjaxScope: a platform for remotely monitoring the client-side behavior of web 2.0 applications. In ACM Symposium on Operating Systems Principles, pages 17--30, 2007. Google ScholarDigital Library
- E. Lazowska, J. Zahorjan, S. Graham, and K. Sevcik. Quantitative system performance: computer system analysis using queuing network models. Prentice Hall, 1984. Google ScholarDigital Library
- M. P. Mesnier, M. Wachs, R. R. Sambasivan, A. Zheng, and G. R. Ganger. Modeling the relative fitness of storage. In ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 37--48, 2007. Google ScholarDigital Library
- Microsoft. Event tracing. http://msdn.microsoft.com/.Google Scholar
- Microsoft. Microsoft customer experience improvement program. http://www.microsoft.com/products/ceip.Google Scholar
- T. M. Mitchell. Machine Learning. McGraw-Hill, 1997. Google ScholarDigital Library
- S. E. Perl and W. E. Weihl. Performance assertion checking. In ACM Symposium on Operating System Principles, pages 134--145, 1993. Google ScholarDigital Library
- P. Reynolds, C. Killian, J. L. Wiener, J. C. Mogul, M. A. Shah, and A. Vahdat. Pip: Detecting the unexpected in distributed systems. In Symposium on Networked Systems Design and Implementation, pages 115--128, 2006. Google ScholarDigital Library
- M. Seltzer, D. Krinsky, K. Smith, and X. Zhang. The case for application-specific benchmarking. In HOTOS '99: Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems, 1999. Google ScholarDigital Library
- K. Shen, M. Zhong, and C. Li. I/O system performance debugging using model-driven anomaly characterization. In Conference on File and Storage Technologies, pages 309--322, 2005. Google ScholarDigital Library
- C. Stewart and K. Shen. Performance modeling and system management for multi-component online services. In Symposium on Networked Systems Design and Implementation, 2005. Google ScholarDigital Library
- E. Thereska and G. R. Ganger. IRONModel: robust performance models in the wild. In ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, 2008. Google ScholarDigital Library
- H. J. Wang, J. C. Platt, Y. Chen, R. Zhang, and Y.-M. Wang. Automatic misconfiguration troubleshooting with PeerPressure. In Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, 2004. Google ScholarDigital Library
Index Terms
- Towards versatile performance models for complex, popular applications
Recommendations
Practical performance models for complex, popular applications
SIGMETRICS '10: Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systemsPerhaps surprisingly, no practical performance models exist for popular (and complex) client applications such as Adobe's Creative Suite, Microsoft's Office and Visual Studio, Mozilla, Halo 3, etc. There is currently no tool that automatically answers ...
Practical performance models for complex, popular applications
Performance evaluation reviewPerhaps surprisingly, no practical performance models exist for popular (and complex) client applications such as Adobe's Creative Suite, Microsoft's Office and Visual Studio, Mozilla, Halo 3, etc. There is currently no tool that automatically answers ...
Comments