Arnold Rosenberg

IC-Optimal Schedules that Accommodate Heterogeneous Clients

Abstract:   Initial results are presented concerning the problem of clustering the tasks of computation-dags so as to accommodate the heterogeneity of remote clients in Internet-based computing. Earlier work has developed the underpinnings of IC-Scheduling theory, an algorithmic framework for scheduling computations having intertask dependencies for Internet-based computing (IC, for short). The goal of the schedules produced by the Theory is to render tasks eligible for execution at the maximum possible rate, with the dual aim of: (a) utilizing remote clients' computational resources well, by enhancing the likelihood of having work to allocate to an available client; (b) lessening the likelihood of a computation's stalling for lack of tasks that are eligible for execution. The current paper addresses the problem of enhancing IC-Scheduling theory so that it can accommodate the varying computational resources of different remote clients. The techniques developed here accomplish this goal by clustering a computation's tasks so as to render the computation multi-granular. After a few ad hoc examples, several task-clustering strategies are developed which exploit the structure of the computation being performed. These strategies range from one that works for any dag, to ones that build increasingly on the explicit structure of the dag being scheduled.

 slides (pdf)

Jon Weinberg

Symbiotic Space-Sharing: Mitigating Resource Contention on SMP Systems

Abstract:   Symbiotic space-sharing is a technique that can improve system throughput by executing parallel applications in combinations and configurations that alleviate pressure on shared resources. In this talk, we explore the opportunity for symbiotic schedulers and discuss initial findings and prototype implementations. We show that such schedulers can improve throughput by 20\% over conventional space-sharing when resource bottlenecks are known. We then discuss approaches for exposing these bottlenecks to the scheduler, including through user-provided job script hints as well as fine-grained, memory signatures extracted from applications. Lastly, we discuss additional studies and components necessary to realize symbiotic scheduling in HPC environments.

 slides (ppt) slides (pdf)

Cynthia Lee

Studies of the User-Scheduler Relationship.

Abstract:   An important prerequisite to effective resource management is productive interaction between the user and scheduler. Unfortunately, relatively little scheduling research directly studies the users themselves. This talk will cover user survey-based studies we have conducted, which probe several aspects of the user-scheduler relationship, and propose solutions to three of the most vexing barriers between the two. First, users' monetary valuation of compute time and schedule turnaround time is examined in terms of a utility function. Second, responsiveness of the scheduler to users' varied valuations is optimized via a genetic algorithm heuristic, creating a controlled market for computation. Finally, the chronic problem of inaccurate user runtime requests, and its implications for scheduler performance, is examined, along with mitigation techniques.

 slides (ppt)

Emmanuel Jeannot

Handling collusion in desktop grid

Abstract:   By exploiting idle time on volunteer's machines, desktop grids provide a way to execute large sets of computational tasks while requiring negligible maintenance and extra cost. Although, it constitutes a valuable resource for low-budget projects, relying on external resources can be hazardous and ensuring the validity of the obtained results is mandatory. We will present a taxonomy of existing and possible risks by considering several aspects of workers faultiness (maliciousness, collusion, dynamicity). Then, we propose a general framework for scheduling tasks in order to obtain reliable results. Our final solution contains mechanisms for the scheduling and answer selection processes which use both a trust-based system that characterizes workers previous behaviors. Joint work with Louis-Claude Canon and Jon Weissman.

 slides (pdf)

Jack Dongarra

Multicore Scheduling and Programming

 slides (pdf)

Denis Trystram

Multi-users, multi-organizations, multi-objectives : a single perspective.

Abstract:   The new computing platforms, like grids composed of multiple clusters, are characterized by new features that force to revisit the scheduling problem. Classical scheduling algorithms are mostly centralized. This centralized character allows to design more efficient algorithms (consider for instance the PTAS of Shmoys for the basic P||cmax problem). However, the platforms are intrinsically distributed. Each cluster is under the administration of a separate organization that has its own rules. Moreover, the resources (processors) are shared between several users that have also their own selfish goals. In this talk, we will first discuss the limitations of the existing scheduling algorithms. Then, the concept of cooperation is introduced both between organizations and users. On the first hand, we propose an approximation algorithm for the global makespan without disturbing the local objectives. On the second hand, we study the problem of optimizing simultaneously the objectives of many users within the same organization. Inapproximability results show that we can not expect too much in such a case. Finally, the mixed case of multi-user competing for resources for several organizations is addressed.

 slides (ppt) slides (pdf)

Kunal Agrawal

Adaptive Scheduling with Parallelism Feedback

Abstract:   Multiprocessor scheduling in a shared multiprogramming environment is often structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level task scheduler schedules the work of a job on the allotted processors. In this context, the number of processors allotted to a particular job may vary during the job's execution, and the task scheduler must adapt to these changes in processor resources. For overall system efficiency, the task scheduler should also provide parallelism feedback to the job scheduler to avoid the situation where a job is allotted processors that it cannot use productively. We present an adaptive task scheduler for multitasked jobs with dependencies that provides continual parallelism feedback to the job scheduler in the form of requests for processors. Our scheduler guarantees that a job completes near optimally while utilizing at least a constant fraction of the allotted processor cycles. Our scheduler can be applied to schedule data-parallel programs, such as those written in High Performance Fortran (HPF), *Lisp, C*, NESL, and ZPL. Our analysis models the job scheduler as the task scheduler's adversary, challenging the task scheduler to be robust to the system environment and the job scheduler's administrative policies. For example, the job scheduler can make available a huge number of processors exactly when the job has little use for them. To analyze the performance of our adaptive task scheduler under this stringent adversarial assumption, we introduce a new technique called ``trim analysis,'' which allows us to prove that our task scheduler performs poorly on at most a small number of time steps, exhibiting near-optimal behavior on the vast majority. To be precise, suppose that a job has work $T_1$ and critical-path length $T_infty$ and is running on a machine with $P$ processors. Using trim analysis, we prove that our scheduler completes the job in $O(T_1/Padj + T_infty + L lg P)$ time steps, where $L$ is the length of a scheduling quantum and $Padj$ denotes the $O(T_infty + L lg P)$-trimmed availability. This quantity is the average of the processor availability over all time steps excluding the $O(T_infty + L lg P)$ time steps with the highest processor availability. When $T_1/T_infty gg Padj$ (the job's parallelism dominates the $O(T_infty + L lg P)$-trimmed availability), the job achieves nearly perfect linear speedup. Conversely, when $T_1/T_infty ll Padj$, the asymptotic running time of the job is nearly the length of its critical path.

 slides (pdf)

Olivier Beaumont

Scheduling Divisible Workloads on Heterogeneous Platforms under Bounded Multi-Port Model

Abstract:   In this talk, we discuss complexity issues for scheduling divisible workloads on heterogeneous systems under the bounded multi-port model. To our best knowledge, this paper is the first attempt to consider divisible load scheduling under a realistic communication model, where the master node can communicate simultaneously to several slaves, provided that bandwidth constraints are not exceeded. We concentrate on one round distribution schemes, where a given node starts its processing only once all data has been received. Our main contributions are (i) the proof that processors start working immediately after receiving their work (ii) the study of the optimal schedule in the case of 2 processors and (iii) the proof that scheduling divisible load under the bounded multi-port model is NP-complete. This last result strongly differs from divisible load literature and represents the first NP-completeness result when latencies are not taken into account.

 slides (pdf)

Paulo Detti

Scheduling unreliable jobs on parallel machines

Abstract:   An allocation and sequencing problem is considered in which a set of independent jobs have to be processed by a set of machines. Each job is characterized by a certain success probability and a reward, which is obtained if the job is successfully carried out. When a job fails during processing, the machine is blocked and the jobs subsequently scheduled on that machine are blocked until the end of the unsupervised period. The problem is to assign and sequence the jobs on the machines so that the expected total reward is maximized. Connections with other problems, complexity and algorithmic results are presented. In particular, we propose a polyhedral characterization for the single machine case, a connection to other scheduling problems implying that the problem we consider is strongly NP-hard even with 2 machines, approximation results and an effective upper bound.

 slides (pdf)

Loris Marchal

Offline and online scheduling of concurrent bags-of-tasks on heterogeneous platforms

Abstract:   Scheduling problems are already difficult on traditional parallel machines. They become extremely challenging on heterogeneous clusters, even when embarrassingly parallel applications are considered. In this paper we deal with the problem of scheduling multiple applications, made of collections of independent and identical tasks, on a heterogeneous master-worker platform. The applications are submitted online, which means that there is no a priori (static) knowledge of the workload distribution at the beginning of the execution. The objective is to minimize the maximum stretch, i.e. the maximum ratio between the actual time an application has spent in the system and the time this application would have spent if executed alone. On the theoretical side, we design an optimal algorithm for the offline version of the problem (when all release dates and application characteristics are known before-hand). We also introduce several heuristics for the general case of online applications. On the practical side, we have conducted extensive simulations and MPI experiments, showing that we are able to deal with very large problem instances in a few seconds. Also, the solution that we compute totally outperforms classical heuristics from the literature, thereby fully assessing the usefulness of our approach.

 slides (pdf)

Bruno Gaujal

From Sturmian words to Balanced trees

Abstract:   Many Scheduling problems with frequency constraints are solved optimally by Sturmian words. Sturmian words are binary infinite words with many equivalent definitions: they are aperiodic sequences with complexity n+1, balanced sequences and mechanical words. For problems with a recursive nature, one rather need to consider Sturmian trees. The first natural definition of Sturmian trees is to consider planar trees of complexity n+1 but in doing so, one looses the other two properties. In this talk, we will consider non-ordered trees. In this case, we show that strongly balanced trees are exactly mechanical trees. Moreover we will see that they are also aperiodic trees of minimal complexity, so that the three equivalent definitions of Sturmian words are almost preserved in that case.

 slides (pdf)

Arnaud Legrand

Toward a Fully Decentralized Algorithm for Multiple Bag-of-tasks Application Scheduling on Grids

Abstract:   In this talk, we present a fully decentralized algorithm for fair resource sharing between multiple bag-of-tasks applications in a grid environment. This algorithm is inspired from related work on multi-path routing in communication network. An interesting feature of this algorithm is that it allows the choice of wide variety of fairness criteria and achieves both optimal path selection and flow control. In addition, this algorithm only requires local information at each slave computing tasks and at each buffer of the network links while minimal computation is done by the schedulers. A naive adaptation is unstable and inefficient though. Fortunately, a simple and effective scaling mechanism is sufficient to circumvent this issue. This scaling mechanism is motivated by a careful study of the subtle differences with the classical multi-path routing problem. We prove its efficiency through a detailed analysis of a simple simulation.

 slides (pdf)

Jean-Marc Nicod

Batch Scheduling for Identical Multi-Tasks Jobs on Heterogeneous Platforms

Abstract:   We consider the scheduling of one batch of identical jobs on a grid of heterogeneous computers. Each job of the batch is an application described by a Directed Acyclic Graph (DAG) of tasks without forks(in-tree) and executed independently from the others jobs. Each job is an instance of the same DAG and the tasks of the DAG are typed, i.e. there may be two identical tasks in one job and a task of one type can only be executed on the computers where the corresponding program is installed. The execution resources are distributed and each resource can carry out a set of task types, the installed programs, so several hosts may be able to carry out the same task. The objective function of our problem is to minimize the makespan of the batch execution. An example that matches these definitions is a workflow to process a set of data in several steps on a grid where the softwares installed on the hosts differs: each step could be a filter applied to one or more images and only some filters are available on each host. In this context, we consider the optimization of the schedule depending on the size, from small to large sized batches, on the characteristics of the batches, etc. Different scheduling techniques could be used to schedule such applications: off-line or online, flow oriented, etc. Classical off-line scheduling techniques for DAGs are heuristics, as there is no direct optimal solution to this problem.They just take one job into account so that multiple jobs will be scheduled as if they were just one application, without benefiting from the knowledge that there are several identical jobs nor of the structure of the jobs. Moreover, if the size of the batch is large,some heuristics may become very costly. On-line techniques are usually simple but they do not either take benefit of the identical structure of the jobs. Steady state oriented scheduling techniques provides an optimal solution in this case but for an infinite number of jobs. We compare three scheduling techniques: a simple on-line scheduler, a steady-state scheduling technique and a genetic heuristic based algorithm. The on-line algorithm just schedule the tasks as soon as they are free of their dependencies. The genetic based algorithm start from a list-scheduling solution and improve it by selecting the best individuals from the makespan of their schedule. The steady-state technique generates a periodic schedule from the result of a linear program, built with the constraints that characterize the problem. By using an experimental evaluation based on simulation, we show that the platform architecture and the DAG structure affect the performances of the schedule differently depending on the used algorithm. Then, we propose some adaptations to these scheduling algorithms to improve their performances on the execution of the batch.

 slides (pdf)

Anthony T. Chronopoulos

Game theory based load balanced job allocation in distributed systems

Abstract:   The main goal of our work is to provide fairness to the users and the users' jobs , i.e. all the users and their jobs should experience approximately equal expected response time (expected queuing delay + processing time + any communication time) or be charged approximately equal price for their execution independent of the computers allocated, and we will show that game theory provides a suitable framework for characterizing such schemes. For distributed systems in which all the jobs belong to a single user (single-class), we use a cooperative game to model the load balancing problem which takes the average system information into account (static load balancing). Our solution is based on the Nash Bargaining Solution which provides a Pareto optimal solution for the distributed system and is also a fair solution. We then extend the system model to include jobs from various users (multi-user/multi-class job distributed system) and include pricing to model a Grid system. We use the concept of Nash equilibrium as the solution of our non-cooperative games and derive distributed algorithms for computing it. We also derive a dynamic scheme tries to minimize the expected response time of the entire system and the other tries to minimize the expected response time of the individual users to provide a fair solution.

 slides (pdf)

Henri Casanova

Scheduling Mixed-Parallel Applications with Advance Reservations

Abstract:   In this talk we discuss the scheduling of mixed-parallel applications, which exhibit both task and data parallelism, in advance reservations settings. Both the problem of minimizing application turn-around time and that of meeting a deadline are studied. For each several scheduling algorithms are proposed, some of which borrow ideas from previously published work in non-reservation settings. Algorithms are compared in simulation over a wide range of application and reservation scenarios. The main finding is that schedules computed using the previously published CPA algorithm can be adapted to advance reservation settings, notably resulting in low resource consumption and thus high efficiency.

 slides (ppt)

Florina M. Ciorba

Enhancing self-scheduling algorithms via synchronization and weighting

Abstract:   Existing dynamic self-scheduling algorithms, used to schedule independent tasks on heterogeneous clusters, cannot handle tasks with dependencies because they lack the support for internode communication. In this talk a synchronization mechanism that provides inter-processor communication, is presented to compensate for the above mentioned deficiency. This mechanism enables self-scheduling algorithms to efficiently handle nested loops with dependencies. A weighting mechanism is presented also that significantly improves the performance of dynamic self-scheduling algorithms. Traditionally, the self-scheduling algorithms divide the total number of tasks into chunks and assign them to processors. The weighting mechanism adapts the chunk sizes to the computing power and current run-queue state of the processors. The synchronization and weighting mechanisms are orthogonal, in the sense that they can simultaneously be applied to loops with dependencies. Thus, they broaden the application spectrum of dynamic self-scheduling algorithms and improve their performance. Extensive tests confirm the efficiency of the synchronization and weighting mechanisms and the significant improvement of the synchronized-weighted versions of the algorithms over the synchronized-only versions.

 slides (pdf)

Kiminori Matsuzaki

An Efficient Implementation of Data-Parallel Skeletons on Multicore Processors

Abstract:   Recently, multicore processors are getting widely available. Developing programs that run efficiently on multicore processors is a hard task since the programs should run in parallel. Furthermore, cache memory shared among CPU cores makes the development much more difficult.

 slides (pdf)

Anne Benoit

Mapping skeleton workflows onto heterogeneous platforms to optimize throughput, latency and reliability

Abstract:   Mapping workflow applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline graphs. Several antagonist criteria should be optimized, such as throughput, latency, reliability (or a combination). Typical applications include digital image processing, where images are processed in steady-state mode. In this talk, we study the complexity of various multi-criteria mapping problems for pipeline graphs. For some NP-hard instances of the problem, polynomial bi-criteria heuristics are presented and their relative performance is evaluated through extensive simulations.

 slides (pdf)

Umit V. Catalyurek

Multi-Objective Scheduling of Streaming Workflows

Abstract:   Scheduling, in many application domains, involves the optimization of multiple performance metrics. In this talk, I will present a novel algorithm framework for the multi-objective scheduling of workflows that act on a stream of input data. Our framework focuses on the two performance metrics: latency and throughput, and minimizes the latency of workflows while satisfying strict throughput requirements. We leverage pipelined, task and data parallelism in a coordinated manner to meet these objectives and investigate the benefit of task duplication in alleviating communication overheads in the pipelined schedule for different workflow characteristics. Evaluation using synthetic benchmarks as well as those derived from real applications shows that our algorithm consistently produces lower-latency schedules and meets throughput requirements, even when previously proposed schemes fail.

 slides (ppt)

Larry Carter

Scheduling: Today and Tomorrow

 slides (ppt) slides (pdf)