File and Batch management of a HPC infrastructure for EDF R&D

Project Leaders

Samuel Kortas EDF R&D SINETICS samuel.kortas@edf.fr
Gael Le Mahec Laboratoire de l'informatique du Parallelisme, (UMR CNRS - ENS Lyon - UCB Lyon 1 - INRIA 5668) gael.le.mahec@ens-lyon.fr

Since October, 2007, with the aim of democratizing / simplifying / unifying their accessibility, a model of access portal to the HPC resources of the R&D is in test. Via a browser, the users have a consolidated view of all the available computing resources and can execute simple operations (launch, cancel and monitoring of jobs, transfer of files, opening of a connection window).

In 2009, this model has been the subject of a complete revision to get a production version of the portal. In particular, the new architecture will be more scalable by including components and it will be able to get finely interfaced with the SALOME, MarketLab, OpenTURNS platforms or the visualization portal : VisuPortal VisuPortal.

Within the framework of this industrialization, EDF R&D sought the skills of the INRIA in term of middlewares. This collaboration aimed at estimating the adequacy to the EDF needs of the DIET middleware developed since 2001 in the INRIA project-team GRAAL. The chosen strategy was to quickly install on the computing resources a prototype built on the DIET middleware that will have the same functionnalities than the existing access portal.

File management

The software stack presented here can be decomposed in different categories:

  • The DIET file manager daemon. It corresponds to a SeD offering the different file magement services for the host where it is executed.
  • The client command-line interfaces. It's a set of DIET clients allowing a transparent access to the different services proposed by the file system daemon.
  • A set of configuration utilities use to setup the software stack. It mainly corresponds to scripts allowing the automatisation of the configuration.

The following figure presents the different elements of the DIET middleware with the connected software of the file management system. The Master Agent (MA) is used to find the corresponding service. The Local Agents (LA) are registering a certain number of File Management Daemons that are providing services and give to the MA the corresponding servers. The Client finally communicates with the File Management daemon hosting the desire service by using the reference provided by the MA.

The software stack provided the usual UNIX operations as follows:

Unix Command Description Corresponding DIET command
chgrp Changing the group of a file or directory diet-chgrp
chmod Changing the permissions of a file or directory diet-chmod
head Displaying the n first lines of a file diet-head
ls Listing the content of a directory diet-ls
mkdir Creating a directory diet-mkdir
rm Removing a file diet-rm
rmdir Removing a directory diet-rmdir
tail Displaying the n last lines of a file diet-tail

To these eight commands are added to more commands for the copy and the movement of files corresponding to the usual cp and mv UNIX commands.

Batch Management

The software stack of the DIET Batch Management can be decomposed in different categories :

  • The Batch Management servces providing the services for the communication with the underlying Batch Systems of the machine where there are executed.
  • The Client programs i.e. commande-line interface for the use of the different services proposed by the daemons depending on the target machine.

The DIET Batch Management software stach allow the transparent execution of common batch operations.

Command (LoadLeveler/Torque) Description Corresponding command
lsubmit/qsub Submission of a job diet-submit
llq/qstat Display of the list of jobs or information about the load of the Batch Scheduler diet-list
lcancel/qdel Cancellation of a job diet-cancel

You can also get information about the current number of jobs in the Batch Scheduler, the current number of waiting jobs and the current number of running jobs.

Websites:

License type:

CeCILL V2

The needs:

  • DIET version:

    2.4

  • Language(s):

    C and C++

  • Compiler(s):

    Classical gcc compiler suite

  • Library(ies):

    No particular library needed

  • System(s):

    Linux, MacOS, BlueGene L, BlueGene P

  • Other specific needs:

    a ssh key pair should allow the user executing DIET to switch to a specific user in order to keep priorities, rights, etc.

Usage example:

  • File Management:
    • How to change the group of a file called foo on the bar machine to the group baz?
      diet-chgrp baz bar:foo
    • How to change the mode of a file called foo on the bar machine to the mode 777?
      diet-chmod 777 bar:foo
    • How to copy a file called foo on the bar machine to a file called foo2 on the bar2 machine?
      diet-cp bar:foo bar2:foo2

    You then get a transfer ID you can use to get the status of the file transfer.

    • How to display the 20 first lines of a file called foo on the bar machine?
      diet-head --line 20 bar:foo
    • How to list all the files with a long display in the home directory on the bar machine?
      diet-ls -al bar:
    • How to create a directory called foo with specific mode 777 file on the bar machine?
      diet-mkdir -m 777 bar:foo
    • How to move a file called foo on the bar machine to a file called foo2 on the bar2 machine?
      diet-mv bar:foo bar2:foo2

    You can then get the status of the transfer based on the transfer id printed on the screen.

    • How to remove a file called foo on the bar machine?
      diet-rm bar:foo
    • How to remove a directory called foo on the bar machine?
      diet-rmdir bar:foo
    • How to get the status of a file with a transfer id : foo?
      diet-status foo
    • How to display the 20 last lines of a file called foo on the bar machine?
      diet-tail --line 20 bar:foo
  • Batch Management:
    • How to cancel a job with jobID bar on the foo machine?
      diet-cancel --host foo bar
    • How to list the number of jobs on the foo machine?
      diet-list --host foo
    • How to get the number of jobs on the bar machine?
      diet-list --host bar --nball
    • How to get the number of running jobs on the foobar machine?
      diet-list --host foobar --nbrun
    • How to get the number of waiting jobs on the foobar2 machine?
      diet-list --host foobar2 --nbwait
    • How to submit a job on the foo machine?
      diet-submit --host foo script.cmd