A streaming application is characterized by its task graph, which is
executed many times,
in a pipeline fashion, on different data sets. Key performance criteria
are the throughput,
and the response time. The throughput can be increased via replication
(different resources will execute
different data sets).
Volatile platforms involve resources whose characteristics evolve over
time, and (worse)
which may be subject to transient or unrecoverable failures. To deal
with variations in speed/bandwidth/etc,
a stochastic model is needed, and robust algorithms are called for.
To cope with failures, (partial) replication or checkpointing strategies
must be used.
Note that in the context of replication for fault-tolerance, different
resources will execute the same
data sets, thereby decreasing the yield of the platform. Similarly,
checkpointing and failure recovery times
will lead to a waste of useful resources.
Finally, environmental factors such as energy consumption and platform
price (or rental cost) must be traded off
against performance-oriented objectives.
Altogether, a challenging multi-criteria optimization problem must be
addressed. We will target simpler instances first,
with a reduced number of optimization objectives, and particular task
graph structures. An interesting direction is to
derive realistic models for varying resource parameters and for
assessing the robustness of the scheduling algorithms.