The First International Workshop on Resilience and/or Energy-aware techniques
for High-Performance Computing (RE-HPC)
To be held in conjunction with the International Green and Sustainable Computing Conference (IGSC), 2016
November 7-9, 2016
Hangzhou, China
Workshop Program (Nov. 7, Huagang Hall, Zhejiang Hotel):
Chair: Jean-Marc Pierson[8:30am - 10:00am] Session 1: Energy Efficiency
[10:00am - 10:30am] Coffee Break
[10:30am - 12:00pm] Session 2: Resilience, Workload and Infrastructures
NEWS: Selected papers presented at the workshop will be invited for a special issue in the Elsevier journal Sustainable Computing: Informatics and Systems (SUSCOM).
Resilience and energy consumption have become two important concerns for high-performance computing (HPC) systems. With the increasing core count and technology miniaturization, today's large computing platforms (datacenters, clusters, supercomputers, etc.) are increasingly prone to failures. Faults are becoming norm rather than exception. Besides the classical fail-stop errors (such as hardware failures), soft errors (such as SDCs for silent data corruptions) constitute another threat that can no longer be ignored by the HPC community. Another concern is energy. Presently, large computing centers are among the largest consumers of energy, hence measures must be taken to reduce energy consumption. Energy is needed not only to power the individual cores but also to provide cooling for the system. In today's datacenters, a large proportion of energy is spent on cooling and thermal-related activities. It is anticipated that the power dissipated to perform communications and I/O transfers will also make up a much larger share of the overall power consumption. The relative cost of communication is expected to increase dramatically, both in terms of latency/overhead and of consumed energy. Re-designing algorithms for HPC systems to ensure resilience and to reduce energy consumption will be crucial to achieving sustained performance. The link between resilience and energy must also be carefully tackled. Better resilience often requires redundancy (replication and/or checkpointing, rollback and recovery), which consumes extra energy. Hot cores may lead to less resilient computing or increase the probability of individual failures. On the other hand, reducing the energy consumption via voltage/frequency scaling techniques will increase the application running time, and hence the expected number of failures during execution. This workshop will encompass a broad range of topics related to resilience and energy efficiency for HPC. Its objective is to facilitate exchange of valuable information and ideas among researchers and practitioners. Topics of interest include (but are not limited to):
- Fault-tolerant algorithms, tools, and protocols
- Checkpointing, replication, and recovery techniques
- Detection and prediction of soft errors and SDCs
- System reliability, testing, and verification
- Resilience models, algorithms, and simulations
- Energy-efficient scheduling and resource management
- Power-aware runtime systems
- Energy-efficient I/O, storage, and networking
- Thermal behavior modeling, control and management
- Cooling-aware optimizations and evaluations
- Tradeoffs between performance, reliability, energy and temperature
Author Information:
All papers should be submitted electronically (in PDF format) following the guidelines of the International Green and Sustainable Computing (IGSC) Conference (http://www.green-conf.org/call_papers.html). Authors should select the workshop on Resilience and/or Energy-aware algorithms for High-Performance Computing (RE-HPC) when submitting their papers on easychair via the following link: https://easychair.org/conferences/?conf=rehpc16. All submitted manuscripts will be reviewed and evaluated on correctness, originality, technical strength, significance, quality of presentation, and interest and relevance to the scope of the workshop. Papers presented at the workshop will be published in the official conference proceedings (through IEEE Digital Library) contingent on two conditions: (1) One author of each accepted paper must register for the conference at the time of the submission of the final manuscript and (2) One of the authors must appear to present the paper at the workshop. Please note that each accepted workshop paper will require a full IGSC registration at the IEEE member or at the non-member rate (NOT student rate). This means that there is no separate workshop-only registration.
Please contact Hongyang Sun (hongyang.sun@ens-lyon.fr) for any questions about this workshop.Important Dates:
Paper Submission: | |
Author Notification: | September 15, 2016 |
Camera-ready Paper: | October 1, 2016 |
Workshop Co-Chairs:
Program Committee:
- Guillaume Aupy, Vanderbilt University, USA
- Leonardo Bautista-Gomez, Barcelona Supercomputing Center, Spain
- Pascal Bouvry, University of Luxembourg, Luxembourg
- Georges Da Costa, IRIT, University of Toulouse, France
- Zhihui Du, Tsinghua University, China
- Amina Guermouche, The University of Tennessee, Knoxville, USA
- Sebastien Lafond, Åbo Akademi University and Turku Center for Computer Science, Finland
- Hermann de Meer, University of Passau, Germany
- Rami Melhem, University of Pittsburgh, USA
- Ariel Oleksiak, Poznan Supercomputing and Networking Center, Poland
- Dana Petcu, West University of Timisoara, Romania
- Enrique Quintana-Orti, HPCA, Jeaume, Spain
- Leonel Sousa, INESC, Portugal
- Patricia Stolf, IRIT, University of Toulouse, France