The DIET scheduling subsystem is based on the notion that, for the sake of system efficacy and scalability, the work of determining the appropriate schedule for a parallel workload should be distributed across the computational platform. When a task in such a parallel workload is submitted to the system for processing, each Server Daemon (SeD) provides a performance estimate - a collection of data pertaining to the capabilities of a particular server in the context of a particular client request - for that task. These estimates are passed to the server's parent agent; agents then sort these responses in a manner that optimizes certain performance criteria. Effectively, candidate SeD s are identified through a distributed scheduling algorithm based on pairwise comparisons between these performance estimations; upon receiving server responses from its children, each agent performs a local scheduling operation called server response aggregation. The end result of the agent's aggregation phase is a list of server responses (from servers in the subtree rooted at said agent), sorted according to the aggregation method in effect. By default, the aggregation phase implements the following ordered sequence of tests:
In principle, this scheduling policy prioritizes servers that are able to provide useful performance prediction information. In general, this approach works well when all servers in a given DIET hierarchy are capable of making such estimations. However, in platforms composed of SeD s with varying capabilities, load imbalances may occur: since DIET systematically prioritizes server responses containing scheduling data, servers that do not respond with such performance data will never be chosen.
We have designed a plugin scheduler facility to enable the application developer to tailor the DIET scheduling to the targeted application. This functionality provides the application developer the means to extend the notion of a performance estimation to include metrics that are application-specific, and to instruct DIET how to treat those data in the aggregation phase. We describe these interfaces in the following sections.