Data links

A data link is a connection between a processor output port and a processor input port as exampled below:

<links>
    <link from="key" to="genParam:paramKey"/>
    <link from="genParam:paramFiles" to="docking:input"/>
    <link from="parameter" to="docking:param"/>
    <link from="docking:result" to="statisticaltest:values" />
    <link from="statisticaltest:result" to="results" />
</links>

When a processor A (port A.out) is connected to a processor B (port B.in) through a data link, an instance of A (one task) may trigger a number of B instances that depends on first, the data depth at both ends of the link and second, the iteration strategy chosen for the B.in port within the B processor.

The data depths on both ends of the link determine the number of data items received by the B.in port. Three cases are possible:

1 to 1: when depth(A.out) depth(B.in), a data item produced by A.out is sent as-is to B.in
1 to N: when depth(A.out) depth(B.in), a data item produced by A.out is an array that will be split into its elements when sent to B. This will produce several parallel instances (tasks) of the B processor. This is equivalent to a foreach structure in usual programming languages, but is here transparent for the user as this is the workflow engines that manages it.
N to 1: when depth(A.out) depth(B.in), several data items produced by A.out (by different tasks) will be grouped in an array before being sent to B.in. This is the opposite behaviour from the previous point. Note that this structure creates a synchronization barrier among the A tasks as they must all be completed before the B tasks can be launched.

The DIET Team - Mer 29 nov 2017 15:13:36 EST