Run an experiment¶

This tutorial explains how to use the SkyProto prototype to run experiments on Grid'5000.

At the end of this file, you will find a set of Python classes that will be useful for your experiments.

Prerequisites¶

You will need to have the prototype installed and ready to deploy. To do this, refer to the guide:

You will also need to install and configure execo and execo_g5k.

Experiment settings¶

Harbours configuration¶

The first step in launching an experiment is to define the Harbours configurations.

For example, in the case of an experiment on migrations, we can imagine a Python function of the form:

def generate_yaml(nb_harbour, type_class, timer_migration, behaviours, rg, names_by_harbour, probability_crash):
    for i in range(nb_harbour):
        dico = {
            "capacity": 90,
            "agents": {
                f"{name}": {
                    "data": name,
                    "rg": rg,
                    "actions": f"{','.join(behaviours)}",
                    "size": 3,
                    "class": type_class,
                    "timer_RandomMigrate": timer_migration,
                    "failureDetectionAfter": 20,
                    "stopSendingAfter": 60*30,
                    "messageDelayInBuffer": 14400,
                    "sleepForBeforeMigrate": 5,
                    "probabilityCrash": probability_crash
            }
            for name in names_by_harbour[i]},
            "name": f"Harbour{i}",
            "port": 8000 + i,
            "gui": "false"
        }
        with open(f"{root_project}/skd/deployment/Harbour{i+1}.yml", "w") as file:
            yaml.dump(dico, file)

This function takes several parameters:

nb_harbour: the number of configuration files to create
type_class: the agent class name
timer_migration: the migration rate
behaviours: the list of behaviours to assign to agents
rg: the number of replicas per family
names_by_harbour: a list such that the ith element contains the list of agent names to create on the harbours
probability_crash: the probability of an agent crashing

Reservation hosts for harbours¶

The utility classes allow several sites to be used to run experiments.

So you need to define the list of sites you want to use, for example Lyon and Nancy.

To reserve your nodes, you have two options:

Reserve your nodes yourself: In this case, you'll need to define a Python dictionary assigning each site to a job. For example: dico = {‘lyon’: 123456789, ‘nancy’: 987654321}.
Use the reserve_nodes_on_sites function, which takes the list of sites and the total number of harbours as parameters. This function makes the reservations automatically, trying to distribute the Harbours fairly. This function returns a dictionary associating each site with a job.

Warning: we strongly advise you to make sure that the reservation has started by using the wait_jobs_start function.

Other functions may also be useful:

get_nodes_for_jobs: retrieves the list of nodes associated with your jobs
delete_nodes: ends your reservation early

Experiment running¶

The utility classes assume that the following filename criteria are met:

Harbours configuration files are called: HarbourX.yml
The java files for the harbours (resp. the central point) are called platform.jar (resp. centralized.jar)

The architecture of the project folder must also be respected:

java_gradle_installer.sh: file for installing the correct version of gradle and java
skd/:
- jarManager: folder containing the agent classes used
- libs: folder containing the java library jars used (the same as on the Git repository)
- build/libs/:
  - centralized.jar: the java executable on central point
  - platform.jar: the Harbours java executable

All these conventions can be modified directly in the code.

Then, to launch the experiment, all you have to do is create an instance of the ExperimentSkydata class. The constructor takes several parameters as input:

jobs: a dictionary associating each site with a job
nb_run: the number of times the experiment should be run
root_project: the path to the project folder
working_dir: the folder in which to work on hosts
prefix_yaml: the folder where the Harbours configuration files are located
duration: the duration of a run
output_directory: the folder in which log files are stored

Example of experiment¶

In this part, we assume we want to do an experiment on the migration process.

We use the same YAML template as discussed before:

def generate_yaml(nb_harbour, type_class, timer_migration, behaviours, rg, names_by_harbour, probability_crash):
    for i in range(nb_harbour):
        dico = {
            "capacity": 90,
            "agents": {
                f"{name}": {
                    "data": name,
                    "rg": rg,
                    "actions": f"{','.join(behaviours)}",
                    "size": 3,
                    "class": type_class,
                    "timer_RandomMigrate": timer_migration,
                    "failureDetectionAfter": 20,
                    "stopSendingAfter": 60*30,
                    "messageDelayInBuffer": 14400,
                    "sleepForBeforeMigrate": 5,
                    "probabilityCrash": probability_crash
            }
            for name in names_by_harbour[i]},
            "name": f"Harbour{i}",
            "port": 8000 + i,
            "gui": "false"
        }
        with open(f"{root_project}/skd/deployment/Harbour{i+1}.yml", "w") as file:
            yaml.dump(dico, file)

We want modify the number of Harbour created, the RG and the probability of crash. To do that, we use the main following program:

root_project = "../../.."
working_dir = "/tmp/migration"

def generate_names(nb):
    return [name + "-" + str(i) for name in ["LULU", "LILI", "TITI", "TOTO", "BOB", "SAM"] for i in range(nb//6 + 1)][:nb]
behaviours = ["migration.RandomMigrate", "replication.ReplicateToReachRGAggregate", "evals.Logger", "presentation.PresentFamily", "evals.FaultSimulator"]

time = strftime("%x_%X").replace('/', '-')
print(time)
os.makedirs(f"{root_project}/g5k/results/migration/{time}/")

for type_algo in ["algorithm.migration.LeaderBased"]:
    for rg in [15]:
        for nb_harbour in [10]:
            for nb_filled in [1]:
                for probability_crash in [0.0]:
                    agents = generate_names(nb_harbour+1)
                    names_by_harbour = [[agents[i]] for i in range(nb_filled)] + [[]] *(nb_harbour-nb_filled)
                    output_directory = f"{root_project}/g5k/results/migration/{time}/{type_algo}/{rg}-{nb_harbour}-{nb_filled}-{probability_crash}/"
                    os.makedirs(output_directory)
                    sites = ["lyon", "nancy"]
                    print(f"{rg=} {nb_harbour=} {nb_filled=}")
                    jobs = reserve_nodes_on_sites(sites, nb_harbour + 1)
                    wait_jobs_start(jobs)
                    tmp_behaviours = behaviours + [type_algo]
                    generate_yaml(nb_harbour, "SKD", 10000, tmp_behaviours, rg, names_by_harbour, probability_crash)
                    scriptMigration = ExperimentSkydata(jobs, 10, root_project, working_dir, f"/skd/deployment", 60*5, output_directory)
                    scriptMigration.execute()
                    delete_nodes(jobs)