Algorithms for High-Performance Computing Platforms (2014-2015)

Lecture 1: Scheduling on Parallel Machines

Lecture 2: Problem relaxation - the divisible load theory

Lecture 3: Problem relaxation - steady-state optimization

Lecture 4: Online problems - assessing the quality of online algorithms

Lecture 5: Handling dynamicity and sources of uncertainties

Lecture 6: Work-stealing

Lectures 6 and 7: Resource allocation on clusters using virtual machines

Lecture 8: Computing with limited memory

Lecture 9 and 10: Resilience

Lecture 11: data locality

Final exam

The final exam of this lecture series is a bibliographical study of a research article to be picked in the list below. Each student must chose an article and notify the teacher by mail. Each student will then have to write a report presenting and commenting the chosen article (4 to 8 pages), and present it in class (presentation of 15 minutes followed by 10 minutes of questions). Report must be sent at the latest on Monday January 5 (five).

Students are authorized to work in a team of two. In that case a team of two students will write a single report, and do a joint presentation: they will have 25 minutes to do a presentation, during which each student will have to speak for at least 10 minutes.



Article Student(s)  
Messages Scheduling for Parallel Data Redistribution between Clusters    
On Scheduling Dags to Maximize Area   Alexandre Talon and Alice Pellet  
Probabilistic Allocation of Tasks on Desktop Grids   Baptiste Jonglez Not for a team of two students
Checkpointing algorithms and fault prediction    
Quantitatively Modeling Application Resilience with the Data Vulnerability Factor    
Fault-Tolerant Dynamic Task Graph Scheduling   Gabriela Paris and Antoine Pouille  
Optimization of a Multilevel Checkpoint Model with Uncertain Execution Scales    
A System Software Approach to Proactive Memory-Error Avoidance  Amir Wonjiga and Sebastian Scheibner  
Multi-organization scheduling approximation algorithms   Aurore Alcolei  
Fault-tolerant scheduling on parallel systems with non-memoryless failure distributions    
Understanding Soft Error Resiliency of BlueGene/Q Compute Chip through Hardware Proton Irradiation and Software Fault Injection   Mihai Popescu  
Process Placement in Multicore Clusters: Algorithmic Issues and Practical Techniques   Julien Le Maire and Fabrice Mouhartem  
Scheduling Parallel Task Graphs on (Almost) Homogeneous Multi-cluster Platforms   Pierre Macherel  
Efficient Task Placement and Routing of Nearest Neighbor Exchanges in Dragonfly Networks    
Fault Tolerance for Remote Memory Access Programming Models    
A Coprocessor Sharing-Aware Scheduler for Xeon Phi-based Compute Clusters    
Exploiting Geometric Partitioning in Task Mapping for Parallel Computers   Antoine Martinet  
Communication-Aware Processor Allocation for Supercomputers   Benjamin Hadjibeyli  


Quelques outils de recherche bibliographique:


Frédéric Vivien
Last modified: Fri Dec 19 17:34:07 CET 2014