article

Open access

Interaction cost and shotgun profiling

Authors:

Brian A. Fields,

Rastislav Bodik,

Mark D. Hill,

Chris J. NewburnAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 1, Issue 3

Pages 272 - 304

https://doi.org/10.1145/1022969.1022971

Published: 01 September 2004 Publication History

PDF eReader

Abstract

We observe that the challenges software optimizers and microarchitects face every day boil down to a single problem: bottleneck analysis. A bottleneck is any event or resource that contributes to execution time, such as a critical cache miss or window stall. Tasks such as tuning processors for energy efficiency and finding the right loads to prefetch all require measuring the performance costs of bottlenecks.In the past, simple event counts were enough to find the important bottlenecks. Today, the parallelism of modern processors makes such analysis much more difficult, rendering traditional performance counters less useful. If two microarchitectural events (such as a fetch stall and a cache miss) occur in the same cycle, which event should we blame for the cycle? What cost should we assign to each event? In this paper, we introduce a new model for understanding event costs to facilitate processor design and optimization.First, we observe that all instructions, hardware structures, and events in a machine can interact in only one of two ways (in parallel or serially). We quantify these interactions by defining interaction cost, which can be zero (independent, no interaction), positive (parallel), or negative (serial).Second, we illustrate the value of using interaction costs in processor design and optimization. In a processor with a long pipeline, we show how to mitigate the negative performance effect of long latency "critical" loops, such as the level-one cache access and issue-wakeup, by optimizing seemingly unrelated resources that interact with them.Finally, we propose shotgun profiling, a class of hardware profiling infrastructures that are parallelism-aware, in contrast to traditional event counters. Our recommended design requires only modest extensions to current hardware counters, while enabling the construction of full-featured dependence graphs of the microexecution. With these dependence graphs, many types of analyses can be performed, including identifying critical instructions, finding slack, as well as computing costs and interaction costs.

References

[1]

Anderson, J. M., Berc, L. M., Dean, J., Ghemawat, S., Henzinger, M. R., Leung, S. A., Sites, R. L., Vandevoorde, M. T., Waldspurger, C. A., and Weihl, W. E. 1997. Continuous profiling: Where have all the cycles gone? ACM Trans. Comput. Syst.

Abstract

References

Cited By

Index Terms

Recommendations

Hardware-Based Profiling: An Effective Technique for Profile-Driven Optimization

Continuous profiling: where have all the cycles gone?

Inclusive Cost Attribution for Cache Use Profiling

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations