Computing Reviews

Fine-grain power breakdown of modern out-of-order cores and its implications on Skylake-based systems
Haj-Yihia J., Yasin A., Asher Y., Mendelson A. ACM Transactions on Architecture and Code Optimization13(4):Article No. 56,2016.Type:Article
Date Reviewed: 08/28/17

Extreme-scale data centers and supercomputers draw many megawatts of power to function. By today’s standards, drawing one megawatt of power roughly costs $1 million; hence, reduced power consumption is critical for these systems. Current and upcoming systems are very efficient in terms of power usage effectiveness (PUE); for example, Intel home-built data centers run at 1.06 PUE and Facebook’s centers run at 1.078 PUE. These PUE numbers tell us that most of the power drawn on these systems is used for application execution, and therefore developers need to shoulder the responsibility of achieving power efficiency as well.

Rotem et al. [1] show that detailed power modeling is necessary because it is not straightforward to gauge whether (1) running a processor at a high frequency to complete the application faster and then putting the processor to sleep or (2) running a processor at a low frequency for a longer time to execute the application results in an efficient power envelope. This paper develops a tool that provides a fine-grained breakdown of the power consumed by different processor and sub-processor domains on the Intel Skylake system.

Intel VTune helps identify performance bottlenecks by employing the top-down analysis method developed by Ahmad Yasin in 2014 [2]. Top-down analysis is built on the idea that studying performance counters in isolation is not as informative as studying them in groups. These (sub)groups of performance counters, that is, meta-performance counters, are useful in pinpointing whether the performance bottlenecks in the developer’s application are frontend-bound or backend-bound or whether they have occurred due to misspeculations. Similarly, the tool built by the authors helps classify whether the power consumption in an application is frontend-bound, backend-bound, or due to misspeculation. To build these meta-performance counters, the authors had to identify weights for each of the performance counters, which make up a meta-performance counter. To do so, they used a set of training microbenchmarks.

Overall, the paper is very informative and nicely written. The experiments are substantial. However, as the authors admit, these counters were studied for one core and one p-state, which is hardly the case in the wild.


1)

Rotem, E.; Naveh, A.; Ananthakrishnan, A.; Weissmann, E.; Rajwan, D. Power-management architecture of the Intel microarchitecture code-named Sandy Bridge. IEEE Micro 32, 2(2012), 20–27.


2)

Yasin, A. A top-down method for performance analysis and counters architecture. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE Computer Society, Piscataway, NJ, 2014, 35–44.

Reviewer:  Karthik Murthy Review #: CR145506 (1711-0731)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy