Workshop Program (Nov. 7, Huagang Hall, Zhejiang Hotel):

Chair: Jean-Marc Pierson

[8:30am - 10:00am] Session 1: Energy Efficiency

  • Energy Aware Software: Issues, Approaches and Challenges. Sébastien Lafond, Simon Holmbacka, Johan Lilius.
  • Host Management Policy for Energy Efficient Dynamic Allocation. Patricia Stolf, Damien Borgetto, Mael Aubert.
  • Fast and Effective Power Profiling of Program Execution Based on Phase Behaviors. Xiaobin Ma, Zhihui Du, Jason Liu.
  • [10:00am - 10:30am] Coffee Break

    [10:30am - 12:00pm] Session 2: Resilience, Workload and Infrastructures

  • Monitoring Strategies for Scalable Dynamic Checkpointing. Swann Perarnau, Leonardo Bautista-Gomez.
  • Modeling and Generating Large-scale Google-like Workload. Georges Da Costa, Léo Grange, Inès De Courchelle.
  • Virtual Desktop Infrastructures: Architecture, Survey and Green Aspects Proof of Concept. Abdallah Ali Z. A. Ibrahim, Dzmitry Kliazovich, Pascal Bouvry, Ariel Oleksiak.

  • NEWS: Selected papers presented at the workshop will be invited for a special issue in the Elsevier journal Sustainable Computing: Informatics and Systems (SUSCOM).

    Resilience and energy consumption have become two important concerns for high-performance computing (HPC) systems. With the increasing core count and technology miniaturization, today's large computing platforms (datacenters, clusters, supercomputers, etc.) are increasingly prone to failures. Faults are becoming norm rather than exception. Besides the classical fail-stop errors (such as hardware failures), soft errors (such as SDCs for silent data corruptions) constitute another threat that can no longer be ignored by the HPC community. Another concern is energy. Presently, large computing centers are among the largest consumers of energy, hence measures must be taken to reduce energy consumption. Energy is needed not only to power the individual cores but also to provide cooling for the system. In today's datacenters, a large proportion of energy is spent on cooling and thermal-related activities. It is anticipated that the power dissipated to perform communications and I/O transfers will also make up a much larger share of the overall power consumption. The relative cost of communication is expected to increase dramatically, both in terms of latency/overhead and of consumed energy. Re-designing algorithms for HPC systems to ensure resilience and to reduce energy consumption will be crucial to achieving sustained performance. The link between resilience and energy must also be carefully tackled. Better resilience often requires redundancy (replication and/or checkpointing, rollback and recovery), which consumes extra energy. Hot cores may lead to less resilient computing or increase the probability of individual failures. On the other hand, reducing the energy consumption via voltage/frequency scaling techniques will increase the application running time, and hence the expected number of failures during execution.

    This workshop will encompass a broad range of topics related to resilience and energy efficiency for HPC. Its objective is to facilitate exchange of valuable information and ideas among researchers and practitioners. Topics of interest include (but are not limited to):

