Skip to content

Run an experiment

This tutorial explains how to use the SkyProto prototype to run experiments on Grid'5000.

At the end of this file, you will find a set of Python classes that will be useful for your experiments.

Prerequisites

You will need to have the prototype installed and ready to deploy. To do this, refer to the guide:

You will also need to install and configure execo and execo_g5k.

Experiment settings

Harbours configuration

The first step in launching an experiment is to define the Harbours configurations.

For example, in the case of an experiment on migrations, we can imagine a Python function of the form:

def generate_yaml(nb_harbour, type_class, timer_migration, behaviours, rg, names_by_harbour, probability_crash):
    for i in range(nb_harbour):
        dico = {
            "capacity": 90,
            "agents": {
                f"{name}": {
                    "data": name,
                    "rg": rg,
                    "actions": f"{','.join(behaviours)}",
                    "size": 3,
                    "class": type_class,
                    "timer_RandomMigrate": timer_migration,
                    "failureDetectionAfter": 20,
                    "stopSendingAfter": 60*30,
                    "messageDelayInBuffer": 14400,
                    "sleepForBeforeMigrate": 5,
                    "probabilityCrash": probability_crash
            }
            for name in names_by_harbour[i]},
            "name": f"Harbour{i}",
            "port": 8000 + i,
            "gui": "false"
        }
        with open(f"{root_project}/skd/deployment/Harbour{i+1}.yml", "w") as file:
            yaml.dump(dico, file)

This function takes several parameters:

  • nb_harbour: the number of configuration files to create
  • type_class: the agent class name
  • timer_migration: the migration rate
  • behaviours: the list of behaviours to assign to agents
  • rg: the number of replicas per family
  • names_by_harbour: a list such that the ith element contains the list of agent names to create on the harbours
  • probability_crash: the probability of an agent crashing

Reservation hosts for harbours

The utility classes allow several sites to be used to run experiments.

So you need to define the list of sites you want to use, for example Lyon and Nancy.

To reserve your nodes, you have two options:

  • Reserve your nodes yourself: In this case, you'll need to define a Python dictionary assigning each site to a job. For example: dico = {‘lyon’: 123456789, ‘nancy’: 987654321}.
  • Use the reserve_nodes_on_sites function, which takes the list of sites and the total number of harbours as parameters. This function makes the reservations automatically, trying to distribute the Harbours fairly. This function returns a dictionary associating each site with a job.

Warning: we strongly advise you to make sure that the reservation has started by using the wait_jobs_start function.

Other functions may also be useful:

  • get_nodes_for_jobs: retrieves the list of nodes associated with your jobs
  • delete_nodes: ends your reservation early

Experiment running

The utility classes assume that the following filename criteria are met:

  • Harbours configuration files are called: HarbourX.yml
  • The java files for the harbours (resp. the central point) are called platform.jar (resp. centralized.jar)

The architecture of the project folder must also be respected:

  • java_gradle_installer.sh: file for installing the correct version of gradle and java
  • skd/:
    • jarManager: folder containing the agent classes used
    • libs: folder containing the java library jars used (the same as on the Git repository)
    • build/libs/:
      • centralized.jar: the java executable on central point
      • platform.jar: the Harbours java executable

All these conventions can be modified directly in the code.

Then, to launch the experiment, all you have to do is create an instance of the ExperimentSkydata class. The constructor takes several parameters as input:

  • jobs: a dictionary associating each site with a job
  • nb_run: the number of times the experiment should be run
  • root_project: the path to the project folder
  • working_dir: the folder in which to work on hosts
  • prefix_yaml: the folder where the Harbours configuration files are located
  • duration: the duration of a run
  • output_directory: the folder in which log files are stored

Example of experiment

In this part, we assume we want to do an experiment on the migration process.

We use the same YAML template as discussed before:

def generate_yaml(nb_harbour, type_class, timer_migration, behaviours, rg, names_by_harbour, probability_crash):
    for i in range(nb_harbour):
        dico = {
            "capacity": 90,
            "agents": {
                f"{name}": {
                    "data": name,
                    "rg": rg,
                    "actions": f"{','.join(behaviours)}",
                    "size": 3,
                    "class": type_class,
                    "timer_RandomMigrate": timer_migration,
                    "failureDetectionAfter": 20,
                    "stopSendingAfter": 60*30,
                    "messageDelayInBuffer": 14400,
                    "sleepForBeforeMigrate": 5,
                    "probabilityCrash": probability_crash
            }
            for name in names_by_harbour[i]},
            "name": f"Harbour{i}",
            "port": 8000 + i,
            "gui": "false"
        }
        with open(f"{root_project}/skd/deployment/Harbour{i+1}.yml", "w") as file:
            yaml.dump(dico, file)

We want modify the number of Harbour created, the RG and the probability of crash. To do that, we use the main following program:

root_project = "../../.."
working_dir = "/tmp/migration"

def generate_names(nb):
    return [name + "-" + str(i) for name in ["LULU", "LILI", "TITI", "TOTO", "BOB", "SAM"] for i in range(nb//6 + 1)][:nb]
behaviours = ["migration.RandomMigrate", "replication.ReplicateToReachRGAggregate", "evals.Logger", "presentation.PresentFamily", "evals.FaultSimulator"]

time = strftime("%x_%X").replace('/', '-')
print(time)
os.makedirs(f"{root_project}/g5k/results/migration/{time}/")

for type_algo in ["algorithm.migration.LeaderBased"]:
    for rg in [15]:
        for nb_harbour in [10]:
            for nb_filled in [1]:
                for probability_crash in [0.0]:
                    agents = generate_names(nb_harbour+1)
                    names_by_harbour = [[agents[i]] for i in range(nb_filled)] + [[]] *(nb_harbour-nb_filled)
                    output_directory = f"{root_project}/g5k/results/migration/{time}/{type_algo}/{rg}-{nb_harbour}-{nb_filled}-{probability_crash}/"
                    os.makedirs(output_directory)
                    sites = ["lyon", "nancy"]
                    print(f"{rg=} {nb_harbour=} {nb_filled=}")
                    jobs = reserve_nodes_on_sites(sites, nb_harbour + 1)
                    wait_jobs_start(jobs)
                    tmp_behaviours = behaviours + [type_algo]
                    generate_yaml(nb_harbour, "SKD", 10000, tmp_behaviours, rg, names_by_harbour, probability_crash)
                    scriptMigration = ExperimentSkydata(jobs, 10, root_project, working_dir, f"/skd/deployment", 60*5, output_directory)
                    scriptMigration.execute()
                    delete_nodes(jobs)