A data link is a connection between a processor output port and a processor input port as exampled below:
<links>
<link from="key" to="genParam:paramKey"/>
<link from="genParam:paramFiles" to="docking:input"/>
<link from="parameter" to="docking:param"/>
<link from="docking:result" to="statisticaltest:values" />
<link from="statisticaltest:result" to="results" />
</links>
When a processor A (port A.out) is connected to a processor B (port B.in)
through a data link, an instance of A (one task) may trigger a number of B
instances that depends on first, the data depth at both ends of the link and
second, the iteration strategy chosen for the B.in port within the B processor.
The data depths on both ends of the link determine the number of data
items received by the B.in port. Three cases are possible:
- 1 to 1: when depth(A.out) depth(B.in), a data item produced
by A.out is sent as-is to B.in
- 1 to N: when depth(A.out) depth(B.in), a data item produced
by A.out is an array that will be split into its elements when sent to
B. This will produce several parallel instances (tasks) of the B
processor. This is equivalent to a foreach structure in usual
programming languages, but is here transparent for the user as this is the
workflow engines that manages it.
- N to 1: when depth(A.out) depth(B.in), several data items
produced by A.out (by different tasks) will be grouped in an array
before being sent to B.in. This is the opposite behaviour from the previous
point. Note that this structure creates a synchronization barrier among the A
tasks as they must all be completed before the B tasks can be launched.
The DIET Team - Mer 29 nov 2017 15:13:36 EST