DAGDA is a new data manager for the DIET middleware which allows data explicit or implicit replications and advanced data management on the grid. It was designed to be backward compatible with previously developed applications for DIET which benefit transparently of the data replications.
DAGDA introduces some new data management features in DIET:

To transfer a data, DAGDA uses the pull model instead of the push model used by DTM: The data are not sent into the profile from the source to the destination, but they are downloaded by the destination from the source. Figure 1 presents how DAGDA manages the data transfers for a standard DIET call:
At each step of the submission, the transfers are allways launched by the destination node. All the transfers operations are transparently performed and the user does not have to modify its application which uses data persistency to take benefits of the data replications.
Remark: After the call, the persistent output data obtained from other nodes are replicated on the SeD and will stay on it until they are erased by the user or by the data replacement algorithm.

Each DAGDA data manager uses the same implementation presented on Figure 2:
Currently DAGDA offers three data replacement algorithms: Least Recently Used (LRU), Least Frequently Used (LFU) and First In First Out (FIFO) :
These algorithms select the data to be removed only among the non-sticky ones of sufficient size to store the new data.
DAGDA is designed to be backward compatible with the previous DIET client-server applications. Moreover, DAGDA extends the standard DIET API with new data management functions. A DAGDA object is interfaced with DIET at each level of the DIET hierarchy.



All the data management function of the API can be used synchronously or asynchronously. For the server and scheduler sides, the tranfers can also be done asynchronously without waiting for their ends.
DAGDA allows the users to replicate the data stored on the platform for performance improvements. To select the nodes where the data should be replicated, the replication function takes a replication rule as a parameter. This string is constructed as follows:
These three parameters have to be separated by the ":" delimiter character. For example, "hostname:*-??.lyon.grid5000.fr:noreplace" is a valid replication rule string which means that the data should be replicated on all the nodes having a hostname matching *-??.lyon.grid5000.fr without removing a data if the free space is not sufficient.

It is frequent that several nodes can access a shared disk partition (through a NFS server on a cluster for example). With DAGDA, a node can be configured to share the files that it manages with all its children nodes. A typical example of the usage of these features is for a service using a large file as read-only parameter executed on several SeD located on the same cluster. Figure 6 presents such a file sharing between several nodes:
The user can configure the SeDs and Agents to store their data on a selected file. Then the client can command to all the nodes that allow it to save their current data status. These nodes can then be stopped and restarted restoring their data status. This command allows the user to stop temporary DIET preserving the data status of the chosen nodes after restart.
The users can fix several parameters about the resources DAGDA is allowed to use:
All these parameters are optionnal. If the maximum CORBA message size exceeds the maximum GIOP size allowed by the ORB, DAGDA uses this last value avoiding CORBA errors that could occur when using DTM. This default behaviour avoids the users to modify the ORB configuration. DAGDA performs the data transfers in several parts if necessary, so sending a data larger than the maximum CORBA message does not cause any error. Moreover, the measured overhead caused by this behaviour is negligible if the maximum message size is larger than some kilobytes.
Questions about DAGDA can be directed to Gael Le Mahec