Energy-Aware High Performance Computing (EnA-HPC)

Keynote Speech

Programming the Energy-Efficiency of High-Performance Computing Systems (Slides)

Dimitrios S. Nikolopoulos

Dimitrios S. Nikolopoulos is Professor in the School of Electronics, Electrical Engineering and Computer Science, at Queen's University of Belfast, where he holds the Chair in High Performance and Distributed Computing (HPDC) and is Director of Research in the HPDC Cluster. His research explores the architecture, programming, evaluation and optimisation of many-core computing systems, with recent emphasis placed on data-intensive systems and energy-efficient computing at scale. For his research, Professor Nikolopoulos has been awarded the NSF CAREER Award, the US DoE Early Career Principal Investigator Award, an IBM Faculty Award, a Marie Curie Fellowship, a Fellowship from HIPEAC, and seven best paper awards, including the SC'2000 Best Technical Paper Award, ACM SIGPLAN PPoPP'2007 Best Paper Award and IPDPS'2002 Best Paper Award. Professor Nikolopoulos teaches introductory and advanced courses on parallel computing, parallel programming, computer organisation, computer architecture, operating systems, and embedded systems programming. He is a Senior Member of the IEEE and a Senior Member of the ACM.

Scientific Sessions

Evaluation of Energy Efficient Ethernet Performance Estimation of High Performance Computing Systems with Energy Efficient Ethernet Technology (Link) (Slides)

Shinobu Miwa, Sho Aita and Hiroshi Nakamura

Efficient Ethernet (EEE) is an Ethernet standard for lowering power consumption in commodity network devices. When the load of a link is low, EEE allows the link to turn into a low power mode and therefore can significantly save the power consumption of a network device. EEE is expected to be adopted in high performance computing (HPC) systems a few years later, but the performance impact caused by EEE-enabled in HPC systems is still unknown. To encourage HPC system developers to adopt the EEE technology, it is required for the performance estimation of the non-existing HPC systems that would utilize the EEE technology. This paper presents the performance estimation method for EEE-supported HPC systems, which utilizes a novel performance model we propose. The model with a few profiles for HPC applications allows us to anticipate the performance of the systems not yet realized. The evaluation results show that the proposed model has the significant accuracy and that EEE is still promising for HPC applications.

Energy-aware Design Space Exploration for GPGPUs (Link) (Slides)

Pascal Libuschewski, Dominic Siedhoff and Frank Weichert

This work presents a novel approach for automatically determining the most energy-efficient Graphics Processing Units (GPUs) with respect to given parallel computation problems.

Energy-Centric Dynamic Fan Control (Link) (Slides)

Benoît Pradelle, Nicolas Triquenaux, Jean Christophe Beyler and William Jalby

Fans are one of the components performing cooling whose power consumption can be reduced by precise, energy-aware control. Such energy-efficient fan controllers must carefully manage fan speeds in order to remain as close as possible to the optimal setting, simultaneously minimizing the power leakage due to heat and fan power consumption, while avoiding overheating. The following paper presents a new Dynamic Fan Controller for Energy (DFaCE) for minimizing the energy consumption of several fans. DFaCE learns the optimal fan setting with no prior knowledge of the hardware setup nor the governing physics, all the while performing useful work on the computer. Once the optimal fan setting is found, the system immediately applies it to minimize the power consumption of the computer at no cost. The system is also evaluated over a set of benchmarks and achieves up to 46 % in the cooling subsystem compared to common temperature-driven control strategies.

An Evaluation of CPU Frequency Transition Delays : Case Study on Intel Architectures (Link) (Slides)

Abdelhafid Mazouz, Alexandre Laurent, Benoît Pradelle and William Jalby

Dynamic Voltage and Frequency Scaling (DVFS) has appeared as one of the most important techniques to reduce energy consumption in computing systems. The main idea exploited by DVFS controllers is to reduce the CPU frequency in memory-bound phases, usually significantly reducing the energy consumption. However, depending on the CPU model, transitions between CPU frequencies may imply varying delays. Such delays are often optimistically ignored in DVFS controllers, whereas their knowledge could enhance the quality of frequency setting decisions. The current article presents an experimental study on the measurement of frequency transition latencies. The measurement methodology is presented accompanied with evaluations on three Intel machines, reflecting three distinct micro-architectures. In overall, we show for our experimental setup that, while changing CPU frequency upward leads to higher transition delays, changing it downward leads to smaller or similar transition delays across the set of available frequencies.

E-AMOM: An Energy-Aware Modeling and Optimization Methodology for Scientific Applications on Multicore Systems (Link) (Slides)

Charles Lively, Valerie Taylor, Xingfu Wu, Hung-Ching Chang, Chun-Yi Su, Kirk Cameron, Shirley Moore and Dan Terpstra

In this paper, we present the Energy-Aware Modeling and Optimization Methodology (E-AMOM) framework, which develops models of runtime and power consumption based upon performance counters and uses these models to identify energy-based optimizations for scientific applications. E-AMOM utilizes predictive models to employ run-time Dynamic Voltage and Frequency Scaling (DVFS) and Dynamic Concurrency Throttling (DCT) to reduce power consumption of the scientific applications, and uses cache optimizations to further reduce runtime and energy consumption of the applications. The models and optimization are done at the level of the kernels that comprise the application. Our models resulted in an average error rate of at most 6.79% for Hybrid MPI/OpenMP and MPI implementations of six scientific applications. With respect to optimizations, we were able to reduce the energy consumption by up to 21%, with a reduction in runtime by up to 14.15%, and a reduction in power consumption by up to 12.50%.

Mapping Fine-Grained Power Measurements to HPC Application Runtime Characteristics on IBM POWER7 (Link) (Slides)

Michael Knobloch, Maciej Foszczynski, Willi Homberg, Dirk Pleiter and Hans Böttiger

Optimization of energy consumption is a key issue for future HPC. Evaluation of energy consumption requires a fine-grained power measurement. Additional useful information is obtained when performing these measurements at component level. In this paper we describe a setup which allows to perform fine-grained power measurements up to a 1~ms resolution at component level on IBM POWER machines. We further developed a plugin for VampirTrace that allows us to correlate these power measurements with application performance characteristics, e.g. obtained by hardware performance counters. This environment allows us to generate both power and performance profiles. Such profiles provide valuable input to develop future strategies for improving workload driven energy usage per performance. We show in comparison with power profiles of coarser granularity that these fine grained measurements are necessary to capture the dynamics of power switching.

Automatic Detection of Power Bottlenecks in Parallel Scientific Applications (Link) (Slides)

Maria Barreda Vayá, Sandra Catalán Pallarés, Manuel F. Dolz Zaragozá, Rafael Mayo Gual and Enrique S. Quintana-Orti

In this paper we present an extension of the pmlib framework for power-performance analysis that permits a rapid and automatic detection of power sinks during the execution of concurrent scientific workloads. The extension is shaped in the form of a multithreaded Python module that offers high reliability and flexibility, rendering an overall inspection process that introduces low overhead. Additionally, we investigate the advantages and drawbacks arising from the use of power sensors introduced in the Intel Xeon “Sandy-Bridge” CPU versus a data acquisition system from National Instruments.

Integrating Performance Analysis and Energy Efficiency Optimizations in a Unified Environment (Link) (Slides)

Robert Schöne and Daniel Molka

Performance analysis tools have been available for decades. They help developers to speed up their applications and pinpoint bottlenecks in scalability. They are wide-spread, well understood, and sophisticated. Since the growing power consumption of HPC systems has become a major cost factor, support for energy efficiency evaluation has been added to various performance analysis tools. Furthermore, benefficial as well as detrimental effects of power saving strategies on energy efficiency are already well understood. However, appropriate tools to directly exploit the detected potentials are not yet available. We therefore present a library that reuses the highly sophisticated instrumentation mechanisms of VampirTrace to dynamically change hardware and software parameters that influence energy efficiency. We also present a library that wraps OpenMP runtimes of several x86_64 compilers in order to provide a low-overhead instrumentation at a parallel region level. This enhances VampirTrace's abilities to handle OpenMP programs without the typically required recompilation.

The 4 Pillar Framework for Energy Efficient HPC Data Centers (Link) (Slides)

Torsten Wilde, Axel Auweter and Hayk Shoukourian

Improving the energy efficiency has become a major research area not just for commercial data centers but also for high performance computing (HPC) data centers. While many approaches for reducing the energy consumption in data centers and HPC sites have been proposed and implemented, as of today, many research teams focused on improving the energy efficiency of data centers are working independently from others. The main reason being that there is no underlying framework that would allow them to relate their work to achievements made elsewhere. Also, without some frame of correlation, the produced results are either not easily applicable beyond their origin or it is not clear if, when, where, and for whom else they are actually useful. This paper introduces the "4 Pillar Framework for Energy Efficient HPC Data Centers" which can be used by HPC center managers to holistically evaluate their site, find specific focus areas, classify current research activities, and identify areas for further improvement and research. The 4 pillars are: 1. Building Infrastructure; 2. HPC Hardware; 3. HPC System Software; and 4. HPC Applications. While most HPC centers already implement optimizations within each of the pillars, optimization efforts crossing the pillar boundaries are still rare. The 4 Pillar framework, however, specifically encourages such cross-pillar optimization efforts. Besides introducing the framework itself, this paper also shows its applicability by mapping current research activities in the field of energy efficient HPC conducted at Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities to the framework as reference.

CoolEmAll - optimising cooling efficiency in data centers (Link) (Slides)

Eugen Volk, Daniel Rathgeb and Ariel Oleksiak

The ICT sector is responsible for around 2% of the global energy consumption, with data centres taking large fraction of this consumption. In many current data centres the actual IT equipment uses only half of the total energy while the remaining part is required for cooling and air movement. This results in poor cooling efficiency and energy efficiency, leading to significant CO2 emissions. EU project CoolEmAll investigates in a holistic approach how cooling, heat transfer, IT infrastructure, and application-workloads influence overall cooling and energy efficiency of data centres. In March 2013, CoolEmAll developed the 1st prototype of a Simulation, Visualisation and Decision support toolkit (SVD Toolkit), enabling assessment and user-driven optimisation of cooling efficiency in data centres by means of coupled workload- and thermal-airflow simulation. In this paper we describe architecture of the SVD Toolkit and present results enabling assessment and optimisation of cooling efficiency in data centres.

Industrial Sessions / Vendor Talks

Considerations and Opportunities for Energy Efficient High-Performance Computing (Slides)

Herbert Cornelius, Intel

In this session we will take a look at some practical steps to improve energy and power efficiency of High-Performance Computing systems from a datacenter to applications point of view. We will show some results of experiments regarding cooling solutions and power limiting impact on energy efficiency. The right choice of power envelope for applications can result into significant energy savings, while the most energy efficient set-up is not always necessarily the best for power/performance.

The future of energy aware memory (Slides)

Frank Koch, Samsung

While the memory technology is getting more difficult and facing some limitation, the requirements on performance, power and capacity of memory keep increasing drastically in supercomputing area. In order to satisfy these requirements and overcome technological barriers, new approaches are inevitable. This presentation will introduce a new breakthrough to continue the technology scaling and several new concepts in system level architecture perspective and address advantages which they could deliver.

Power consumption of clusters: Control and Optimization (Slides)

Luigi Brochard, Francois Thomas, IBM

After an introduction on the different sources of power consumption of a data center (power consumption, cooling and power loss), we’ll dig into the various components of the power consumption of a server : CPU, Memory, IO, etc. We’ll show the relation between power consumption and performance and the trade-off between power, performance and energy which we can take advantage of. We then explain Loadleveler features which have been added to implement Energy Aware Scheduling on x86 systems to control and optimize power and energy consumption of clusters and its benefit based on LRZ and other customer applications and experience. In a final part, we present the roadmap for Energy Aware Scheduling features in IBM Platform LSF.

ColdCon: Hotwatercooling – Made in Germany (Slides)

Jörg Heydemüller, MEGWARE

Energy efficient HPC systems (Slides)

Jean-Pierre Panziera, Bull

Energy consumption of HPC systems has become a major concern of all supercomputing centers. As HPC systems got more powerful, they also got more energy hungry which means rising operating expenses and investments in the datacenters. HPC systems efficiency touches all components of the system (processors, memory, IO, …) and it requires an efficient “free” cooling. Finally, to optimize system global performance, the runtime environment manages the different resources based on precise energy monitoring.