Motivations and Principles
Recently, the advances in high performance networks and in communication library dedicated to distributed computing make it possible to use heterogeneous network of workstations instead of dedicated expensive supercomputers such as massively parallel calculators. However, even if it is now possible to obtain tremendous performances on a given network, it is difficult to keep these performances as soon as we gather several networks (in the case of Grid computing). This is mainly due to the heterogeneity of the obtained platform, and the effect of bottlenecks. Moreover, discovering the topology of the computing platform is often challenging, although it is crucial for the efficient deployment of grid applications. Considering the communications involved in such distributed applications, we can gather them into some macrocommunication schemes, such as broadcasts, scatters, alltoall or reduce operations. These macrocommunication schemes have often been studied with the goal of minimizing their makespan, i.e. the time elapsed between the emission of the first message by the source, and the last reception. But in many cases, the application has to perform a large number of instances of the same operation (for example if data parallelism is used). When dealing with such a series of macrocommunications, pipelining is mandatory to achieve good performance. The relevant objective becomes the optimization of the throughput, that is the average number of macrocommunications executed per timeunit.
Results
We have studied these collective communications in the context of throughput maximization, and thus designed scheduling strategies with optimal throughput (leading to asymptotically optimal schedules with respect to the classical makespan metric) for most of the classical collective communication primitives. Here is a summary of our work in this field:

We first studied the Scatter and Reduce operations, and developed a new framework for studying this problem, using a relaxation based on linear programming. This results in an article in APDCM'04. An extended journal version (in JPDC) will be soon available.

Then, we moved to the study of the most famous collective communication schemes: Broadcast and Multicast. As replication of the messages is allowed for these communications (and necessary to reach good performances), these problems are more challenging than the previous ones. However, using graph techniques, we are able to design the same kind of schedules (optimal with the respect of the throughput metric) for the broadcast problem (see the article in IPDPS'04, and the extended version in the IEEE TPDS journal). The complexity of Multicast operations turns out to be more difficult: we have shown that this is in fact an NPcomplete problem (see the article in ICPP'04).

All previous studies assume a bidirectional oneport model: at each timestep, each machine is available to perform at most one sending and one receiving communications. Although this model is widely accepted, we tried to adapt our results to the unidirectional oneport model, when a machine cannot perform simultaneously a sending and a receiving operations. Surprisingly, even the Scatter problem turns out to be much more complicated than in the bidirectional case, but we design an approach based on the ellipsoid method for solving big (non polynomial) linear problems (this technical method is available as a research report and in our IJHPCA paper ).

As previous approach for solving problems such as Broadcasting a series of messages may lead to a big control overhead (as we are using several concurrent broadcast trees), we have designed some heuristics to get good performances while keeping the control complexity low (that is using a single broadcast tree). We present this work using a large range of platform models in the article in IPDPS'05.

