Monitoring
There are 3 main ways to monitor the computing resources during experiment execution, they are:
Dstat
TIG stack: Telegraf/InfluxDB/Grafana
TPG stack: Telegraf/Prometheus/Grafana
To enable monitoring in E2Clab, users have to configure the layers_services.yaml
file
as follows:
Define the monitoring by adding the
monitoring
attribute (more details in the next sections).Add
roles: [monitoring]
on eachService
the user wants to monitor.
In addition, you can monitor energy consumption:
Monitoring profile in FIT IoT LAB
To enable monitoring of FIT IoT LAB nodes in E2Clab, users have to configure the
layers_services.yaml
file as follows:
Define the monitoring profile by adding the
monitoring_iotlab
attribute (for more details refer to Section Set up a monitoring profile in FIT IoT LAB).
1. Set up Dstat
G5K, FIT IoT LAB, or Chameleon Cloud
Set up dstat is very simple (see example below).
1monitoring:
2 type: dstat
2. Set up TIG stack: Telegraf/InfluxDB/Grafana
It requires a monitoring provider
. This provider is a dedicated machine hosting
InfluxDB
and Grafana
. For visualizing the monitoring data in Grafana you have to
follow the instructions in the layers_services-validate.yaml
(file located in the
experiment directory).
After deployed, the monitoring provider
will be available at
http://paravance-10.rennes.grid5000.fr:3000
. You can access it from your local
machine as follows ssh -NL 3000:localhost:3000 paravance-10.rennes.grid5000.fr
. You
can use admin
for the username and password.
G5K
1monitoring:
2 type: tig
3 provider: g5k
4 # you can use `cluster` or `servers` to deploy the monitoring provider
5 cluster: paravance
6 servers: ["paravance-10.rennes.grid5000.fr"]
7 # if `private`, a new network is created for the monitoring traffic.
8 # if `private`, it requires at least 2 NICs in the machine.
9 network: shared or private
10 # if the monitoring provider will use a IPv4 or IPv6 network
11 ipv: 4 or 6
12 # you can provide a config file (must be in `artifacts_dir`) for the telegraf agents.
13 agent_conf: telegraf.conf.j2
Chameleon Cloud
1monitoring:
2 type: tig
3 provider: chameleoncloud
4 cluster: compute_cascadelake
G5K + FIT IoT LAB
For G5K + FIT IoT LAB, a firewall rule is needed.
The reconfigurable Firewall API resource URLs are of the form
https://api.grid5000.fr/stable/sites/<site>/firewall/<jobid>
where <site>
and
<jobid>
are the Grid’5000 site and the OAR job number for which one requests openings.
For instance: https://api.grid5000.fr/stable/sites/rennes/firewall/1961803
.
In the example below, we open a firewall rule for the monitoring_service
(the
monitoring provider) on port 8086
(InfluxDB). It allows the telegraf agents on FIT
IoT LAB nodes to send their data to the monitoring service on G5K.
1environment:
2 g5k:
3 cluster: paravance
4 job_type: ["allow_classic_ssh"]
5 firewall_rules:
6 - services: ["monitoring_service"]
7 ports: [8086]
8 iotlab:
9 cluster: grenoble
10monitoring:
11 type: tig
12 provider: g5k
13 cluster: paravance
14 network: shared
15 ipv: 6
3. Set up TPG stack: Telegraf/Prometheus/Grafana
G5K
1monitoring:
2 type: tpg
3 provider: g5k
4 # you can use `cluster` or `servers` to deploy the monitoring provider
5 cluster: paravance
6 servers: ["paravance-10.rennes.grid5000.fr"]
7 # if `private`, a new network is created for the monitoring traffic.
8 # if `private`, it requires at least 2 NICs in the machine.
9 network: shared or private
10 # if the monitoring provider will use a IPv4 or IPv6 network
11 ipv: 4 or 6
Chameleon Cloud
1monitoring:
2 type: tpg
3 provider: chameleoncloud
4 cluster: compute_cascadelake
G5K + FIT IoT LAB
Prometheus uses a pull model to scrape metrics from the telegraf agents. In this case, we do not need to create a firewall rule. IPv6 connection from Grid’5000 to IoT-LAB is allowed (the inverse is not true unless you open the firewall port, as presented earlier).
1monitoring:
2 type: tpg
3 provider: g5k
4 cluster: paravance
5 network: shared
6 ipv: 6
4. Set up a monitoring profile in FIT IoT LAB (energy consumption)
Next, we show how to set up a monitoring profile to monitor current
, voltage
, and
power
of FIT IoT LAB nodes (in this case, a8
and rpi3
nodes). You can manage
the monitoring profiles in the dashboard through this link
https://www.iot-lab.info/testbed/resources/monitoring
.
1monitoring_iotlab:
2 profiles:
3 - name: test_capture_a8
4 archi: a8 # ['a8', 'custom']
5 current: True # [True, False]
6 power: True # [True, False]
7 voltage: True # [True, False]
8 period: 8244 # [140, 204, 332, 588, 1100, 2116, 4156, 8244]
9 average: 4 # [1, 4, 16, 64, 128, 256, 512, 1024]
10 - name: test_capture_rpi
11 archi: custom
12 current: True
13 power: True
14 voltage: True
15 period: 8244
16 average: 4
5. Starting monitoring and saving captured data
For Dstat, TIG, and TPG, the monitoring is started during the launch
step of the
workflow.yaml
file or when the user executes the following command:
$ e2clab workflow /path/to/scenario/ launch
It allows capturing monitoring data before the start of each Service
.
Energy monitoring in FIT IoT LAB starts after reservation.
All the monitoring data is saved in the /path/to/scenario_dir/monitoring-data/
directory. It is saved in the finalize
step with the following command:
$ e2clab finalize /path/to/scenario_dir/
Besides saving the monitoring data, it also stops the monitoring services and agents started on each machine.
Try some examples
We provide a few tutorials: