Algorithms for High-Performance Computing Platforms (2014-2015)

Lecture 9 and 10: Resilience

Lecture 11: data locality

Final exam

The final exam of this lecture series is a bibliographical study of a research article to be picked in the list below. Each student must chose an article and notify the teacher by mail. Each student will then have to write a report presenting and commenting the chosen article (4 to 8 pages), and present it in class (presentation of 15 minutes followed by 10 minutes of questions). Report must be sent at the latest on Monday January 5 (five).

Students are authorized to work in a team of two. In that case a team of two students will write a single report, and do a joint presentation: they will have 25 minutes to do a presentation, during which each student will have to speak for at least 10 minutes.

Article	Student(s)
Messages Scheduling for Parallel Data Redistribution between Clusters
On Scheduling Dags to Maximize Area	Alexandre Talon and Alice Pellet
Probabilistic Allocation of Tasks on Desktop Grids	Baptiste Jonglez	Not for a team of two students
Checkpointing algorithms and fault prediction
Quantitatively Modeling Application Resilience with the Data Vulnerability Factor
Fault-Tolerant Dynamic Task Graph Scheduling	Gabriela Paris and Antoine Pouille
Optimization of a Multilevel Checkpoint Model with Uncertain Execution Scales
A System Software Approach to Proactive Memory-Error Avoidance	Amir Wonjiga and Sebastian Scheibner
Multi-organization scheduling approximation algorithms	Aurore Alcolei
Fault-tolerant scheduling on parallel systems with non-memoryless failure distributions
Understanding Soft Error Resiliency of BlueGene/Q Compute Chip through Hardware Proton Irradiation and Software Fault Injection	Mihai Popescu
Process Placement in Multicore Clusters: Algorithmic Issues and Practical Techniques	Julien Le Maire and Fabrice Mouhartem
Scheduling Parallel Task Graphs on (Almost) Homogeneous Multi-cluster Platforms	Pierre Macherel
Efficient Task Placement and Routing of Nearest Neighbor Exchanges in Dragonfly Networks
Fault Tolerance for Remote Memory Access Programming Models
A Coprocessor Sharing-Aware Scheduler for Xeon Phi-based Compute Clusters
Exploiting Geometric Partitioning in Task Mapping for Parallel Computers	Antoine Martinet
Communication-Aware Processor Allocation for Supercomputers	Benjamin Hadjibeyli

Quelques outils de recherche bibliographique:

Frédéric Vivien

Last modified: Fri Dec 19 17:34:07 CET 2014

Algorithms for High-Performance Computing Platforms (2014-2015)

Lecture 1: Scheduling on Parallel Machines

Lecture 2: Problem relaxation - the divisible load theory

Lecture 3: Problem relaxation - steady-state optimization

Lecture 4: Online problems - assessing the quality of online algorithms

Lecture 5: Handling dynamicity and sources of uncertainties

Lecture 6: Work-stealing

Lectures 6 and 7: Resource allocation on clusters using virtual machines

Lecture 8: Computing with limited memory

Lecture 9 and 10: Resilience

Lecture 11: data locality

Final exam