ANR MapReduce (2010-2013)
Map-Reduce is a parallel programming paradigm successfully used by large Internet service providers to perform computations on massive amounts of data. After being strongly promoted by Google, it has also been implemented by the open source community through the Hadoop project, maintained by the Apache Foundation and supported by Yahoo! and even by Google itself. This model is currently getting more and more popular as a solution for rapid implementation of distributed data-intensive applications. The key strength of the Map-Reduce model is its inherently high degree of potential parallelism.
In this project, the goal of our work package is to provide a complete runtime environment for MapReduce application on Desktop Grid. At the moment there exists no such environment dedicated to Desktop Grid. We will rely on the BitDew middleware, developed by INRIA, which is a programmable environment for automatic and transparent data management on computational Desktop Grids. BitDew relies on a specific set of metadata to drive key data management operations, namely life cycle, distribution, placement, replication and fault-tolerance with a high level of abstraction. The Bitdew runtime environment is a flexible distributed service architecture that integrates modular P2P components such as DHTs for a distributed data catalog, and collaborative transport protocols for data distribution, asynchronous and reliable multi-protocols transfers.
FP7 EDGI (2010-2012)
EDGI will develop middleware that consolidates the results achieved in the EDGeS project concerning the extension of Service Grids with Desktop Grids in order to support EGI and NGI user communities that are heavy users of DCIs and require extremely large number of CPUs and cores. EDGI will go beyond existing DCIs that are typically cluster Grids and supercomputer Grids, and will extend them with public and institutional Desktop Grids and Clouds. EDGI will integrate software components of ARC, gLite, Unicore, BOINC, XWHEP, 3G Bridge, and Cloud middleware such as OpenNebula and Eucalyptus into SG→DG→Cloud platforms for service provision and as a result EDGI will extend ARC, gLite and Unicore Grids with volunteer and institutional DG systems. EDGI will develop DG→Cloud bridge middleware with the goal to get instantly available additional resources for DG systems if the application has some QoS requirements that could not be satisfied by the available resources of the DG system. EDGI will improve Desktop Grid middleware (BOINC and XWHEP) in order to handle QoS requirements and the SG→DG bridge middleware in order to support data-intensive applications. EDGI will deploy a production infrastructure that integrates ARC-, gLite- and Unicore-based Grids with Desktop Grids based on the bridge middleware developed in EDGI. The production EDGI infrastructure will also enable the dynamic, on-demand extensions of the connected Desktop Grids with Cloud resources. As such EDGI users can benefit of the versatile and flexible eco-system provided by EDGI. The EDGI production infrastructure will be offered as service for EGI and NGI user communities. It will also serve as a demonstration for NGIs to extend their eco-system with Desktop Grids and Clouds. EDGI will establish a European Desktop Grid federation (working title is “EuroCivis“) to coordinate DG-related activities in Europe both for solving technical issues as well as to attract volunteer DG resource donors by disseminating results of the EDGI and EGI-related projects. EuroCivis and EDGI will work in strong collaboration with EGI, EMI, NorduGrid, Unicore Forum and interested NGIs.
Recently, a new vision of cloud computing has emerged where the complexity of an IT infrastructure is completely hidden from its users. At the same time, cloud computing platforms provide massive scalability, 99.999% reliability, and speedy performance at relatively low costs for complex applications and services. In this proposed collaboration, we investigate the use of cloud computing for large-scale and demanding applications and services over unreliable resources. In particular, we target volunteered resources distributed over the Internet. The motivation is the immense collective power of volunteer resources (evident by s 3.9 PetaFLOPS system), and the near-zero amortized costs of using such resources. We will address three main challenges. First, we will develop statistical and predictive methods for ensuring that a group of N resources is continuously available for T time. In large-scale Internet-distributed systems, resource failures are inevitable. So second, we will develop checkpointing methods and strategies based on virtual machines for masking failures. Third, we will apply our predictive methods for data management. In particular, we seek to achieve guarantees for data availability, durability and access performance for Internet-distributed storage. Finally, we will implement our research results in a system prototype, evaluated with real applications.
ADT BitDew (2010-2012)
L’ADT BitDew vise a développer un logiciel GPL pour la distribution, la gestion et le traitement des données à large échelle. Initialement développé pour les grilles de PC, BitDew répond à un besoin croissant de la communauté d’une boite à outil logicielle permettant de réaliser des expériences et des intergiciels pour les applications de traitement intensif de données. L’ADT permettra : (i) de fournir une base de documentation et de formation au logiciel BitDew, (ii) de renforcer la qualité du logiciel et de mieux répondre aux souhaits des utilisateurs, (iii) de fournir de nouvelles fonctionnalités permettant la gestion des infrastructures de type Cloud Computing et de répondre aux standards des grilles telles que EGEE.
ANR DSLLAB (2005-2009)
DSLlab is a research project aiming at building and using an experimental platform about distributed systems running on DSL Internet. The objective is twofold : 1) provide accurate and customized measures of availability, activity and performances in order to characterize and tune the models of the ADSL resources ; 2) provide a validation and experimental tool for new protocols, services and simulators and emulators for these systems. DSLlab consists of a set of low power, low noise computers spread over the ADSL. These computers are used simultaneously as active probes to capture the behavior traces, and as operational nodes to launch experiments. We expect from this experiment a better knowledge of the behavior of the ADSL and the design of accurate models for emulation and simulation of these systems which represents now a significant capability in terms of storage and computing power.
FP6 Grid4All (2005-2008)
Our proposal targets the vision of a "democratic" Grid as a ubiquitous utility whereby domestic users, small organizations, and SMEs may draw on resources on the Internet without having to individually invest and manage computing and IT resources. This addresses the first objective of this year's call which is to foster uptake and use in business and society. In their current state, service-oriented Grids and production Grids focus on the unification and sharing of resources contributed and used by the participating organizations into (almost) static virtual organizations and do not address the diversity, scale and dynamicity of a pervasive Grid. This de-facto brings us to address the second objective which is to reduce the complexity of Grid-based systems, empowering individuals and organisations to create, provide access to and use a variety of services, anywhere, anytime, in a transparent and cost-effective way, realising the vision of a knowledge-based and ubiquitous utility.