Monitoring

There are 3 main ways to monitor the computing resources during experiment execution, they are:

  • Dstat

  • TIG stack: Telegraf/InfluxDB/Grafana

  • TPG stack: Telegraf/Prometheus/Grafana

To enable monitoring in E2Clab, users have to configure the layers_services.yaml file as follows:

  • Define the monitoring by adding the monitoring attribute (more details in the next sections).

  • Add roles: [monitoring] on each Service the user wants to monitor.

In addition, you can monitor energy consumption:

  • Monitoring profile in FIT IoT LAB

To enable monitoring of FIT IoT LAB nodes in E2Clab, users have to configure the layers_services.yaml file as follows:

  • Define the monitoring profile by adding the monitoring_iotlab attribute (for more details refer to Section Set up a monitoring profile in FIT IoT LAB).

1. Set up Dstat

G5K, FIT IoT LAB, or Chameleon Cloud

Set up dstat is very simple (see example below).

1monitoring:
2  type: dstat

2. Set up TIG stack: Telegraf/InfluxDB/Grafana

It requires a monitoring provider. This provider is a dedicated machine hosting InfluxDB and Grafana. For visualizing the monitoring data in Grafana you have to follow the instructions in the layers_services-validate.yaml (file located in the experiment directory).

After deployed, the monitoring provider will be available at http://paravance-10.rennes.grid5000.fr:3000. You can access it from your local machine as follows ssh -NL 3000:localhost:3000 paravance-10.rennes.grid5000.fr. You can use admin for the username and password.

G5K

 1monitoring:
 2  type: tig
 3  provider: g5k
 4  # you can use `cluster` or `servers` to deploy the monitoring provider
 5  cluster: paravance
 6  servers: ["paravance-10.rennes.grid5000.fr"]
 7  # if `private`, a new network is created for the monitoring traffic.
 8  # if `private`, it requires at least 2 NICs in the machine.
 9  network: shared or private
10  # if the monitoring provider will use a IPv4 or IPv6 network
11  ipv: 4 or 6
12  # you can provide a config file (must be in `artifacts_dir`) for the telegraf agents.
13  agent_conf: telegraf.conf.j2

Chameleon Cloud

1monitoring:
2  type: tig
3  provider: chameleoncloud
4  cluster: compute_cascadelake

G5K + FIT IoT LAB

For G5K + FIT IoT LAB, a firewall rule is needed. The reconfigurable Firewall API resource URLs are of the form https://api.grid5000.fr/stable/sites/<site>/firewall/<jobid> where <site> and <jobid> are the Grid’5000 site and the OAR job number for which one requests openings. For instance: https://api.grid5000.fr/stable/sites/rennes/firewall/1961803.

In the example below, we open a firewall rule for the monitoring_service (the monitoring provider) on port 8086 (InfluxDB). It allows the telegraf agents on FIT IoT LAB nodes to send their data to the monitoring service on G5K.

 1environment:
 2  g5k:
 3    cluster: paravance
 4    job_type: ["allow_classic_ssh"]
 5    firewall_rules:
 6      - services: ["monitoring_service"]
 7        ports: [8086]
 8  iotlab:
 9    cluster: grenoble
10monitoring:
11  type: tig
12  provider: g5k
13  cluster: paravance
14  network: shared
15  ipv: 6

3. Set up TPG stack: Telegraf/Prometheus/Grafana

G5K

 1monitoring:
 2  type: tpg
 3  provider: g5k
 4  # you can use `cluster` or `servers` to deploy the monitoring provider
 5  cluster: paravance
 6  servers: ["paravance-10.rennes.grid5000.fr"]
 7  # if `private`, a new network is created for the monitoring traffic.
 8  # if `private`, it requires at least 2 NICs in the machine.
 9  network: shared or private
10  # if the monitoring provider will use a IPv4 or IPv6 network
11  ipv: 4 or 6

Chameleon Cloud

1monitoring:
2  type: tpg
3  provider: chameleoncloud
4  cluster: compute_cascadelake

G5K + FIT IoT LAB

Prometheus uses a pull model to scrape metrics from the telegraf agents. In this case, we do not need to create a firewall rule. IPv6 connection from Grid’5000 to IoT-LAB is allowed (the inverse is not true unless you open the firewall port, as presented earlier).

1monitoring:
2  type: tpg
3  provider: g5k
4  cluster: paravance
5  network: shared
6  ipv: 6

4. Set up a monitoring profile in FIT IoT LAB (energy consumption)

Next, we show how to set up a monitoring profile to monitor current, voltage, and power of FIT IoT LAB nodes (in this case, a8 and rpi3 nodes). You can manage the monitoring profiles in the dashboard through this link https://www.iot-lab.info/testbed/resources/monitoring.

 1monitoring_iotlab:
 2  profiles:
 3    - name: test_capture_a8
 4      archi: a8               # ['a8', 'custom']
 5      current: True           # [True, False]
 6      power: True             # [True, False]
 7      voltage: True           # [True, False]
 8      period: 8244            # [140, 204, 332, 588, 1100, 2116, 4156, 8244]
 9      average: 4              # [1, 4, 16, 64, 128, 256, 512, 1024]
10    - name: test_capture_rpi
11      archi: custom
12      current: True
13      power: True
14      voltage: True
15      period: 8244
16      average: 4

5. Starting monitoring and saving captured data

For Dstat, TIG, and TPG, the monitoring is started during the launch step of the workflow.yaml file or when the user executes the following command:

$ e2clab workflow /path/to/scenario/ launch

It allows capturing monitoring data before the start of each Service.

Energy monitoring in FIT IoT LAB starts after reservation.

All the monitoring data is saved in the /path/to/scenario_dir/monitoring-data/ directory. It is saved in the finalize step with the following command:

$ e2clab finalize /path/to/scenario_dir/

Besides saving the monitoring data, it also stops the monitoring services and agents started on each machine.

Try some examples

We provide a few tutorials: