Kwollect monitoring on Grid’5000
In this tutorial, we will show how to use e2clab
with the kwollect monitoring service deployed on Grid’5000: kwollect documentation
Kwollect is particularly useful as it grants access to a bunch of already availble metrics on the hosts like:
wattmetre power consumption
metrics from the Board Management Controller (bmc)
metrics from Prometheus node exporter
etc
You can find all the availble metrics on every Grid’5000 clusters in the kwollect documentation.
In this short example, we will show how to
Fetch some of those metrics for your experiments
Analyze them
Push your own custom metrics to the monitoring stack
Experiment Artifacts
The artifacts repository contains the E2Clab configuration files
$ git clone https://gitlab.inria.fr/E2Clab/examples/monitoring-kwollect.git
$ cd monitoring-kwollect
The structure of the experimental setup looks like this:
monitoring-kwollect/ # SCENARIO_DIR
├── artifacts/ # ARTIFACTS_DIR
│ └── push_metrics.sh
├── .e2c_env
├── layers_services.yaml
├── networks.yaml
├── workflow.yaml
├── workflow_env.yaml
├── analysis.py
├── requirements.txt
├── README.md
└── ...
Defining the Experimental Environment
Experiment definition
Notice that the experiment artifacts contain a .e2c_env
file:
1E2C_DEBUG=false
2E2C_SCENARIO_DIR="./"
3E2C_ARTIFACTS_DIR="./artifacts/"
This defines environment variables that will be passed to the e2clab
CLI.
As you can see in the documentation (E2Clab command-line interface), we can define environment varibales for the SCENARIO_DIR
and ARTIFACTS_DIR
arguments as well as other CLI arguments and options.
In this example we define E2C_SCENARIO_DIR
and E2C_ARTIFACTS_DIR
as well as whether we want the debug logs. This way we do not have to indicate the SCENARIO_DIR
and ARTIFACTS_DIR
arguments every time that we want to launch a command.
Payload
This simple payload found in the artifacst directory is a simple bash script exporting a my_e2clab_metric
metric to the kwollect monitoring stack.
1#!/bin/bash
2
3for val in {1..10}; do
4 echo "kwollect_custom{_metric_id=\"my_e2clab_metric\"} $val" >/var/lib/prometheus/node-exporter/kwollect.prom
5 sleep 15
6done
Note
If your change the name of the metric from my_e2clab_metric
to any other name, make sure that new name validates the regular exression in the monitor
section of the layers_services.yaml file, otherwise it will not be collected.
To kwow more about custom metrics and at what timestamp they are recorded, check the Pushing custom metrics section in the kwollect documentation.
Layers & Services Configuration
This example of layers_services.yaml file defines our experiment’s deployment.
We make a reservation on the paradoxe
cluster and define a new section called kwollect
(whose schema you can find in Schema)
Within the kwollect
section we define two things:
- The metrics we want to pull from kwollect in the
metrics
option wattmetre_power_watt
bmc_node_power_watt
Our custom
my_e2clab_metric
that we are going to push during the experiment
- The metrics we want to pull from kwollect in the
- The timeframe we want to pull the metrics from, defined by the
step
option We pull the metrics recorded during the time it took to run the
launch
part of the workflow
- The timeframe we want to pull the metrics from, defined by the
Learn more about those options in the documentation section dedicated to Monitoring Grid’5000 using Kwollect metrics.
Another important detail, the kwollect documentation mentions that not all metrics are recorded from the nodes by default.
To make sure that we have access the the bmc_node_power_watt
metric and that our custom my_e2clab_metric
are collected by the kwollect monitoring stack, we have to activate them with the monitor
option in the g5k
section of the environment
.
1environment:
2 job_name: e2clab-kwollect
3 walltime: 01:00:00
4 g5k:
5 job_type: ["allow_classic_ssh"]
6 cluster: paradoxe
7 # Activating all metrics containing 'power' and 'metric'
8 monitor: ".*metric*.|.*power*."
9kwollect:
10 metrics:
11 # wattmetre_power_watt is activated by default
12 - wattmetre_power_watt
13 - bmc_node_power_watt
14 - my_e2clab_metric
15 # If you want to pull all metrics possible:
16 # - all
17 step: "launch"
18layers:
19- name: cloud
20 services:
21 - name: Server
22 # You can specify which services to pull metrics from with the "k_monitor" role
23 # If "kwollect" is defined but none are set, it will pull from all Grid'5000 nodes by default
24 # roles: ["k_monitor"]
25 cluster: "paradoxe"
26- name: fog
27 services:
28 - name: Producer
29 cluster: "paradoxe"
30 # roles: ["k_monitor"]
Network Configuration
No network emulation needed:
1networks:
Workflow Configuration
In this simple monitoring demonstration, we are just going to run some stress tests on the hosts with the stress
command.
We also run the push_metrics.sh
script in the background of the fog
host to demonstrate how to publish and collect custom metrics with kwollect.
In this case, the push_metrics.sh
just sets the my_e2clab_metric
to a new value every 15 seconds.
This functionality may be useful if you need to monitor some values in real time or you want to fetch those metrics at the same time that you fetch the data from the kwollect api.
The {{ env_time }}
variables refer to application configurations defined in the workflow_env.yaml file.
When running the long
application configuration, the stress
commands will last “60s”, and “30s” when using the short
configuration.
To know more about how workflow.yaml
and workflow_env.yaml
articulate, you ca read the following documentation: workflow_env.yaml and application configuration.
1- hosts: cloud.*
2 launch:
3 # Running a stress command
4 - shell: "stress -c 8 --timeout {{ env_time }} && sleep 10 && stress -c 8 --timeout {{ env_time }}"
5
6- hosts: fog.*
7 prepare:
8 - copy:
9 src: "{{ working_dir }}/push_metrics.sh"
10 dest: "/opt/push_metrics.sh"
11 mode: preserve
12 launch:
13 # Pushing our custom metrics in the background
14 - shell: "/opt/push_metrics.sh"
15 async: 3600
16 poll: 0
17 # Running a stress command
18 - shell: "stress -c 8 --timeout {{ env_time }} && sleep 10 && stress -c 8 --timeout {{ env_time }}"
1short:
2 time: "30s"
3
4long:
5 time: "60s"
Running & Verifying Experiment Execution
Run the experiment
Use the command bellow to run this example:
$ e2clab deploy --app_conf short,long
The command will run the whole workflow for both configurations of your workflow long
and short
.
Check metrics in real-time
During the experiment’s execution, you can access the kwollect monitoring dashboard for the rennes site at: https://api.grid5000.fr/stable/sites/rennes/metrics/dashboard, and entering the Job ID
corresponding to the deployment of the experiment on Grid’5000.
You can find that job id in the logs while running the experiment or in the e2clab.logs
file.
[E2C,KWOLLECT] Access kwollect metric dashboard for job <JOB ID>: https://api.grid5000.fr/stable/sites/rennes/metrics/dashboard

Example of visualization on the kwollect dashboard interface
Analyze experiment metrics
At the end of the deployment, the metrics that you requested will be fetched from the kwollect API and saved in a csv file in your result directory which should look lilke:
YYMMDD-HHMMSS/
├── long/
│ ├── monitoring-data/
│ │ └── kwollect-data/
│ └── workflow-validate.out
├── short/
│ └── ...
├── e2clab.err
├── e2clab.log
├── layers_services-validate.yaml
└── workflow-validate.out
You will get output data for both the long
and short
configurations of the experiment.
Power consumption
We provide a simple python script to vizualize the data that was pulled from the kwollect API.
Note
It is best to run the follwoing commands inside of a python virtual environment
(venv)$ pip install -r requirements.txt
(venv)$ analysis.py YYYYMMDD-hhmmss/long/monitoring-data/kwollect-data/<site>.yaml

Example of power consumption monitoring on Grid’5000 nodes
We can clearly see the rise in power consumption caused by the stress
command.
Note
To know more about the caveats of power monitoring on Grid’5000, check the following link: https://www.grid5000.fr/w/Unmaintained:Power_Monitoring_Devices#measurement_artifacts_and_pitfalls
Custom metric
We can also check that our cutom metric my_e2clab_metric
has been captured by kwollect,
first by looking at the online dashboard and entering my_e2clab_metric
in the Metric name
selection:

We can also check that it was indeed pulled from the API and saved in our experiment results:
$ cat YYYYMMDD-hhmmss/long/monitoring-data/kwollect-data/<site>.yaml | grep "my_e2clab_metric"
Note
To kwow more about custom metrics and at what timestamp they are recorded, check the Pushing custom metrics section in the kwollect documentation.
Free the computing resources
Once you are done, you may kill your job on the Grid’5000 platform using the following command:
$ e2clab destroy