The results presented in this thesis deal with the execution of applications on heterogeneous and distributed environments: computing grids. We study, from end-to-end, the process allowing users to execute complex scientific applications. The contributions of this work are thus manifold. 1) Hierarchical middleware deployment: we first present an execution model for hierarchical middleware. Then, based on this model, we present several heuristics to automatically determine the shape of the hierarchy that would best fit the users' needs, depending on the platform it is executed on. We evaluate the quality of the approach on a real platform using the DIET middleware. 2) Graph clustering: we propose a distributed and self-stabilizing algorithm for clustering weighted graphs. Clustering is done based on a distance metric between nodes: within each created cluster the nodes are no farther than a distance k from an elected leader in the cluster. 3) Scheduling: we study the scheduling of independent tasks under resources usage limitations. We define linear programs to solve this problem in two cases: when tasks arrive all at the same time, and when release dates are considered. 4) Cosmological simulations: we have studied the behavior of applications required to run cosmological simulations workflows. Then, based on the DIET grid middleware, we implemented a complete infrastructure allowing non-expert users to easily submit cosmological simulations on a computing grid.
The Grid computing model is now widely used. It allows users to access a lot of computing resources, but it does not offer the same homogeneous environment, which is available on supercomputers or clusters. Therefore problems arise: how to choose the resources to run the applications on, and how to schedule them. Depending on these choices, the performance obtained can vary a lot (variations in communication cost, in allocated computing power, etc.). However, as grids are very complex and dynamic environments, it becomes mandatory to propose tools to assist them in running their applications. These tools should harness the difficulty of choosing the resources, setting up the environment, installing applications, and running them according to a scheduling policy.
Application deployment on a heterogeneous distributed platform consists in selecting, and then allocating computing resources to these applications.
Nowadays, scientific applications are still mainly deployed manually: resources choice and allocation, and then installation and launching phases are still at the user's charge. Some tools to simplify this process exist, but they remain incomplete: some are application specific; others are more generic and deal with directed acyclic graphs representing dependencies on tasks. A last category deals with a set of applications. These software do not cover the same part of the deployment process, and only a few proposes automatic methods to map applications on resources. It becomes mandatory to propose an automatic deployment process: from the resources selection down to the execution. However, a deployment choice may be correct at a given time, but as Grids are very dynamic and error prone environments, the problem of redeployment has to be taken into account. Redeployment means modifying the current deployment plan, to make it fit the new parameters (loss of nodes, network capacity fluctuation, etc.). The tasks of Grid modeling and statistical analysis may help the decisions taken while planning the deployment.
My main interest lies in the deployment of the DIET Grid Middleware. DIET is a hierarchical middleware for high performance computing (HPC). It provides serveral features to run scientific applications: Data Management, Fault Tolerance, Workflow Management, and also Scheduling in various ways. The main problem while deploying DIET is to find the suitable shape of the hierarchy. One cannot design a generic hierarchy to fit any distributed heterogeneous platform (the problem of finding the optimal hierarchy in terms of throughput is NP complete, it reduces to finding the best broadcast tree in a heterogeneous network).
Software deployment
Computational Grids
Self-stabilization
Cosmological simulations
LEGO, ANR-05-CIGC-11