MS2PH -- Des mutations structurelles aux phenotypes des pathologies humaines.

Project Leader

O. Poch, G. Deleage et coll IGBMC/IBCP Strasbourg/Lyon poch@igbmc.u-strasbg.fr

The aim of this project is double: set up a grid for the descriptive analysis of proteins with known mutations, and the elaboration of a predictable tool for the simplification of the comprehension of this mutations in human diseases. This project joins in the fundamental context of the understanding of the mechanisms which control the function of proteins. Indeed, the way a protein withdraws in the space determines its interactions with the cellular environment, including the way it is going to interact with the therapeutic molecules.

The program realize the sudy of a sequence through a "pipe" of several treatments (blast, ballast, clustal, filtrages, normd, ...). The pipe is manage by a TCL script.

This program uses large size databases.

DIET allows users to submit to the grid a file containing thousands of proteins to study (for example to update the database MS2PH-db) and to allow the launching of these computations in parallel on the Decrypthon grid.

DIET allows the execution of parallel computations on the entire Decrypthon grid, by interfacing with the local batch schedulers (Loadleveler, OAR). Thus the resources of the "university grid" can be used while allowing them to be shared with the local users when they're not used by the Decrypthon project. The DIET WebBoard give transparency of the grid to the users.

License type:

Not distributed.

The needs:

  • Language(s):

    The main script is written in TCL. Other used programs are written in C/C++

  • Compiler(s):

    cc/gcc (linux) and xlc/xlC (AIX)

  • Library(ies):

    none

  • System(s):

    Linux, AIX

  • Memory:

    Up to 2 GB.

  • Disk:

    Up to 1 GB of temporary space.

  • Mean execution time:

    It depends on the input sequence and on the databased to compare: from several minutes to more than 48 hours.

  • Number of processors:
    • Minimum: 1
    • Maximum: 1
  • Use of a Database:

    Call to an external web database via CURL requests.

Usage example:

In a command line the use of the pipe is pretty simple:

 ./pipegrille.tcl -batch -File,file=${protein_file}

With $protein_file corresponding to the file containing the sequence to be processed. It is also possible to pass other several options, i.e. :

./pipegrille.tcl -batch -File,log=verbose -File,file=${protein_file} \
-Etapes,blast=${blast_step} \
-Etapes,cluspack=${cluspack_step} \
-Etapes,filter=${filter_step} \
-Etapes,normd1=${normd_dbclustal_step} \
-Etapes,normd2=${normd_rascal_step} \
-Etapes,normd3=${normd_leon_step} \
-Etapes,ballast=${ballast_step} \
-Etapes,leon=${leon_step} \
-Etapes,rascal=${rascal_step} \
-Etapes,clustal=${clustal_step} \
-Blast,e=${max_e_value_expect} \
-Blast,b=${number_of_alignments_to_show} \
-Blast,v=${number_of_descriptions_to_print} \
-Blast,m="${blast_m}" \
-Blast,d="${blast_databank}" \
-Blast,K=${number_of_best_region_to_keep} \
-Blast,f=${threshold} \
-Blast,g=${gapped_blast} \
-Blast,F=${filter_sequences} \
-Filter,expect=${max_e_value_expect_to_consider} \
-Filter,maxseq=${max_number_of_sequences_to_align} \
-Filter,length=${max_sequence_length} \
-Filter,method="${blast_filtering}" \
-Filter,add=${add_sequences_from_ballast} \
-Filter,fragment=${remove_fragments} \
-DbClustal,motifs=${use_motifs} \
-DbClustal,propagate=${propagate} \
-Etapes,macsims=${macsim_step} \
-Etapes,conservation=${conservation_step} \
-Project,project=${real_project}

All those parameters can of course be generated automatically through the DIET WebBoard web interface:

Docking