SpeQuloS: Quality of Service for Best-Effort Distributed Computing Infrastructures
Participants: Simon Delamare, Gilles Fedak, Derrick Kondo, Oleg Lodygensky
The SpeQuloS Software Home Page is here : http://graal.ens-lyon.fr/~sdelamar/spequlos/
The aim of the project is to provide an middleware to implement Quality of Service for unreliable DCI (e.g. Desktop Grid) by provisionning additional stable and on-demand resources (e.g. Cloud) in the scope of the European FP7 EDGI project.
The EDGI project
EDGI is an FP7 European project, following the successful FP7 EDGeS project, whose goal is build a Grid infrastructure composed of "Desktop Grids", such as BOINC or XtremWeb, where computing resources are provided by Internet volunteers, and "Service Grids", where computing resources are provided by institutional Grid such as EGEE, gLite, Unicore and "Clouds systems" such as OpenNebula and Eucalyptus, where resources are provided on-demand. The goal of the EDGI project is to provide an infrastructure where Service Grids are extended with public and institutional Desktop Grids and Clouds. Our partners include SZTAKI insitute (Hungary), CIEMAT (Spain), Univ. Coimbra (Portugal), Univ Cardiff (UK), Univ Westminster (UK), AlmereGrid (NL), IN2P3 (FR) and more.
The main problem with the currentinfrastructure is that it cannot give any QoS support for running their application in the Desktop Grid part of the infrastructure. For example, a public DG system enables clients to return work-unit results in the range of weeks. Although there are EGEE applications (e.g. the fusion community’s applications) that can tolerate such a long latency most of the user communities want much smaller latency.
The SpeQuloS middleware
The INRIA leads the activities of the Work Package JRA2 : QoS support for Desktop Grids to solve this critical problem.
We define QoS concretely as a probabilistic guarantee of job makespan or throughput. Providing QoS features even in Service Grids is hard and not solved yet satisfactorily. It is even more difficult in an environment where there are no guaranteed resources. In DG systems resources can leave the system at any time for a long time or forever even after taking several work-units with the promise of computing them. Two main approaches will be investigated within EDGI and deployed in the EDGI production infrastructure.
The first approach classifies the DG clients according to their historical behavior and allocates applications with QoS needs to the more trustable and faster clients. However, even in this case it can happen that some of the work-units are not completed in time.
The second approach is based on the extension of DG systems with Cloud resources. For such critical work-units the SpeQuloS system is able to dynamically deploy fast and trustable clients from some Clouds that are available to support the EDGI DG systems. It takes the right decision about assigning the necessary number of trusted clients and Cloud clients for the QoS applications.
The Spequlos middleware is composed of 4 components: a Qos Info system, QoS credit system, QoS Oracle and a QoS Scheduler
- The QoS Oracle has two responsibilities. First, the scheduling oracle determines feasible deadlines on each DG. A deadline is feasible if the DG can meet it with 95% confidence. To do this, the scheduling oracle takes into account the job's quality of service requirements, and the past performance of each Desktop Grid. Second, for jobs needing fast turnaround time, the oracle should determine when tasks of the job submitted to the Desktop Grid must be run on the dedicated Cloud, and give this recommendation to the QoS scheduler.
- The goal of the QoS scheduler is to provide guaranties that series or batch of jobs submitted by SG users to a particular DG will be finished by a certain deadline with a given probability. To do so, the QoS scheduler extends DG systems with additional Cloud resources. In EDGI we will deal with two different Cloud systems implementations (likely candidates are Eucalyptus and OpenNebula) in order to develop a generic solution not tightly connected to one particular Cloud interface. Once jobs are submitted to DG, the QoS scheduler monitor jobs’ execution. According to user QoS ability, the QoS Scheduler adds additional computing resources if it detects that the DG will not be able to complete the jobs on time. Technically, this will be achieved by using the libcloud library.
- There is one more important aspect of QoS support which is fairness. Obviously every user will want to submit applications with QoS request. If they can do it in an unlimited way we can not provide QoS to anyone. So we need a mechanism by which users can claim QoS resources according to their contributions to the resources of the EDGI infrastructure. The QoS credit system allows to measure the institutional contributions and to allocate QoS resources in proportion with the collected credits.
EDGI is supported by the FP7 Capacities Programme under grant agreement nr RI-261556.