Default Scheduling Strategy

The DIET scheduling subsystem is based on the notion that, for the sake of system efficacy and scalability, the work of determining the appropriate schedule for a parallel workload should be distributed across the computational platform. When a task in such a parallel workload is submitted to the system for processing, each Server Daemon (SeD) provides a performance estimate - a collection of data pertaining to the capabilities of a particular server in the context of a particular client request - for that task. These estimates are passed to the server's parent agent; agents then sort these responses in a manner that optimizes certain performance criteria. Effectively, candidate SeD s are identified through a distributed scheduling algorithm based on pairwise comparisons between these performance estimations; upon receiving server responses from its children, each agent performs a local scheduling operation called server response aggregation. The end result of the agent's aggregation phase is a list of server responses (from servers in the subtree rooted at said agent), sorted according to the aggregation method in effect. By default, the aggregation phase implements the following ordered sequence of tests:

  1. Least recently used: In the absence of application- and platform-specific performance data, the DIET scheduler attempts to probabilistically achieve load balance by assigning client requests based on the time they last finished to compute. Essentially each server records a timestamp indicating the last time at which it was assigned a job for execution. Each time a request is received, the SeD computes the time elapsed since its last execution, and among the responses it receives, DIET agents select SeD s with a longer elapsed time.
  2. Random: If the SeD is unable to store timestamps, the DIET scheduler will chose randomly when comparing two otherwise equivalent SeD performance estimations.

In principle, this scheduling policy prioritizes servers that are able to provide useful performance prediction information. In general, this approach works well when all servers in a given DIET hierarchy are capable of making such estimations. However, in platforms composed of SeD s with varying capabilities, load imbalances may occur: since DIET systematically prioritizes server responses containing scheduling data, servers that do not respond with such performance data will never be chosen.

We have designed a plugin scheduler facility to enable the application developer to tailor the DIET scheduling to the targeted application. This functionality provides the application developer the means to extend the notion of a performance estimation to include metrics that are application-specific, and to instruct DIET how to treat those data in the aggregation phase. We describe these interfaces in the following sections.

The DIET Team - Mer 29 nov 2017 15:13:36 EST