Run an experiment¶
This tutorial explains how to use the SkyProto prototype to run experiments on Grid'5000.
At the end of this file, you will find a set of Python classes that will be useful for your experiments.
Prerequisites¶
You will need to have the prototype installed and ready to deploy. To do this, refer to the guide:
You will also need to install and configure execo and execo_g5k.
Experiment settings¶
Harbours configuration¶
The first step in launching an experiment is to define the Harbours configurations.
For example, in the case of an experiment on migrations, we can imagine a Python function of the form:
def generate_yaml(nb_harbour, type_class, timer_migration, behaviours, rg, names_by_harbour, probability_crash):
for i in range(nb_harbour):
dico = {
"capacity": 90,
"agents": {
f"{name}": {
"data": name,
"rg": rg,
"actions": f"{','.join(behaviours)}",
"size": 3,
"class": type_class,
"timer_RandomMigrate": timer_migration,
"failureDetectionAfter": 20,
"stopSendingAfter": 60*30,
"messageDelayInBuffer": 14400,
"sleepForBeforeMigrate": 5,
"probabilityCrash": probability_crash
}
for name in names_by_harbour[i]},
"name": f"Harbour{i}",
"port": 8000 + i,
"gui": "false"
}
with open(f"{root_project}/skd/deployment/Harbour{i+1}.yml", "w") as file:
yaml.dump(dico, file)
This function takes several parameters:
nb_harbour
: the number of configuration files to createtype_class
: the agent class nametimer_migration
: the migration ratebehaviours
: the list of behaviours to assign to agentsrg
: the number of replicas per familynames_by_harbour
: a list such that the ith element contains the list of agent names to create on the harboursprobability_crash
: the probability of an agent crashing
Reservation hosts for harbours¶
The utility classes allow several sites to be used to run experiments.
So you need to define the list of sites you want to use, for example Lyon and Nancy.
To reserve your nodes, you have two options:
- Reserve your nodes yourself:
In this case, you'll need to define a Python dictionary assigning each site to a job. For example:
dico = {‘lyon’: 123456789, ‘nancy’: 987654321}
. - Use the
reserve_nodes_on_sites
function, which takes the list of sites and the total number of harbours as parameters. This function makes the reservations automatically, trying to distribute the Harbours fairly. This function returns a dictionary associating each site with a job.
Warning: we strongly advise you to make sure that the reservation has started by using the wait_jobs_start
function.
Other functions may also be useful:
get_nodes_for_jobs
: retrieves the list of nodes associated with your jobsdelete_nodes
: ends your reservation early
Experiment running¶
The utility classes assume that the following filename criteria are met:
- Harbours configuration files are called:
HarbourX.yml
- The java files for the harbours (resp. the central point) are called
platform.jar
(resp.centralized.jar
)
The architecture of the project folder must also be respected:
java_gradle_installer.sh
: file for installing the correct version of gradle and javaskd/
:jarManager
: folder containing the agent classes usedlibs
: folder containing the java library jars used (the same as on the Git repository)build/libs/
:centralized.jar
: the java executable on central pointplatform.jar
: the Harbours java executable
All these conventions can be modified directly in the code.
Then, to launch the experiment, all you have to do is create an instance of the ExperimentSkydata
class. The constructor takes several parameters as input:
jobs
: a dictionary associating each site with a jobnb_run
: the number of times the experiment should be runroot_project
: the path to the project folderworking_dir
: the folder in which to work on hostsprefix_yaml
: the folder where the Harbours configuration files are locatedduration
: the duration of a runoutput_directory
: the folder in which log files are stored
Example of experiment¶
In this part, we assume we want to do an experiment on the migration process.
We use the same YAML template as discussed before:
def generate_yaml(nb_harbour, type_class, timer_migration, behaviours, rg, names_by_harbour, probability_crash):
for i in range(nb_harbour):
dico = {
"capacity": 90,
"agents": {
f"{name}": {
"data": name,
"rg": rg,
"actions": f"{','.join(behaviours)}",
"size": 3,
"class": type_class,
"timer_RandomMigrate": timer_migration,
"failureDetectionAfter": 20,
"stopSendingAfter": 60*30,
"messageDelayInBuffer": 14400,
"sleepForBeforeMigrate": 5,
"probabilityCrash": probability_crash
}
for name in names_by_harbour[i]},
"name": f"Harbour{i}",
"port": 8000 + i,
"gui": "false"
}
with open(f"{root_project}/skd/deployment/Harbour{i+1}.yml", "w") as file:
yaml.dump(dico, file)
We want modify the number of Harbour created, the RG and the probability of crash. To do that, we use the main following program:
root_project = "../../.."
working_dir = "/tmp/migration"
def generate_names(nb):
return [name + "-" + str(i) for name in ["LULU", "LILI", "TITI", "TOTO", "BOB", "SAM"] for i in range(nb//6 + 1)][:nb]
behaviours = ["migration.RandomMigrate", "replication.ReplicateToReachRGAggregate", "evals.Logger", "presentation.PresentFamily", "evals.FaultSimulator"]
time = strftime("%x_%X").replace('/', '-')
print(time)
os.makedirs(f"{root_project}/g5k/results/migration/{time}/")
for type_algo in ["algorithm.migration.LeaderBased"]:
for rg in [15]:
for nb_harbour in [10]:
for nb_filled in [1]:
for probability_crash in [0.0]:
agents = generate_names(nb_harbour+1)
names_by_harbour = [[agents[i]] for i in range(nb_filled)] + [[]] *(nb_harbour-nb_filled)
output_directory = f"{root_project}/g5k/results/migration/{time}/{type_algo}/{rg}-{nb_harbour}-{nb_filled}-{probability_crash}/"
os.makedirs(output_directory)
sites = ["lyon", "nancy"]
print(f"{rg=} {nb_harbour=} {nb_filled=}")
jobs = reserve_nodes_on_sites(sites, nb_harbour + 1)
wait_jobs_start(jobs)
tmp_behaviours = behaviours + [type_algo]
generate_yaml(nb_harbour, "SKD", 10000, tmp_behaviours, rg, names_by_harbour, probability_crash)
scriptMigration = ExperimentSkydata(jobs, 10, root_project, working_dir, f"/skd/deployment", 60*5, output_directory)
scriptMigration.execute()
delete_nodes(jobs)