Monitoring 

There are 3 main ways to monitor the computing resources during experiment execution, they are:

Dstat
TIG stack: Telegraf/InfluxDB/Grafana
TPG stack: Telegraf/Prometheus/Grafana

To enable monitoring in E2Clab, users have to configure the layers_services.yaml file as follows:

Define the monitoring by adding the monitoring attribute (more details in the next sections).
Add roles: [monitoring] on each Service the user wants to monitor.

In addition, you can monitor energy consumption:

Monitoring profile in FIT IoT LAB

To enable monitoring of FIT IoT LAB nodes in E2Clab, users have to configure the layers_services.yaml file as follows:

Define the monitoring profile by adding the monitoring_iotlab attribute (for more details refer to Section Set up a monitoring profile in FIT IoT LAB).

1. Set up Dstat 

G5K, FIT IoT LAB, or Chameleon Cloud

Set up dstat is very simple (see example below).

monitoring:
  type: dstat

2. Set up TIG stack: Telegraf/InfluxDB/Grafana 

It requires a monitoring provider. This provider is a dedicated machine hosting InfluxDB and Grafana. For visualizing the monitoring data in Grafana you have to follow the instructions in the layers_services-validate.yaml (file located in the experiment directory).

After deployed, the monitoring provider will be available at http://paravance-10.rennes.grid5000.fr:3000. You can access it from your local machine as follows ssh -NL 3000:localhost:3000 paravance-10.rennes.grid5000.fr. You can use admin for the username and password.

G5K

monitoring:
  type: tig
  provider: g5k
  # you can use `cluster` or `servers` to deploy the monitoring provider
  cluster: paravance
  servers: ["paravance-10.rennes.grid5000.fr"]
  # if `private`, a new network is created for the monitoring traffic.
  # if `private`, it requires at least 2 NICs in the machine.
  network: shared or private
  # if the monitoring provider will use a IPv4 or IPv6 network
  ipv: 4 or 6
  # you can provide a config file (must be in `artifacts_dir`) for the telegraf agents.
  agent_conf: telegraf.conf.j2

Chameleon Cloud

monitoring:
  type: tig
  provider: chameleoncloud
  cluster: compute_cascadelake

G5K + FIT IoT LAB

For G5K + FIT IoT LAB, a firewall rule is needed. The reconfigurable Firewall API resource URLs are of the form https://api.grid5000.fr/stable/sites/<site>/firewall/<jobid> where <site> and <jobid> are the Grid’5000 site and the OAR job number for which one requests openings. For instance: https://api.grid5000.fr/stable/sites/rennes/firewall/1961803.

In the example below, we open a firewall rule for the monitoring_service (the monitoring provider) on port 8086 (InfluxDB). It allows the telegraf agents on FIT IoT LAB nodes to send their data to the monitoring service on G5K.

environment:
  g5k:
    cluster: paravance
    job_type: ["allow_classic_ssh"]
    firewall_rules:
      - services: ["monitoring_service"]
        ports: [8086]
  iotlab:
    cluster: grenoble
monitoring:
  type: tig
  provider: g5k
  cluster: paravance
  network: shared
  ipv: 6

3. Set up TPG stack: Telegraf/Prometheus/Grafana 

G5K

monitoring:
  type: tpg
  provider: g5k
  # you can use `cluster` or `servers` to deploy the monitoring provider
  cluster: paravance
  servers: ["paravance-10.rennes.grid5000.fr"]
  # if `private`, a new network is created for the monitoring traffic.
  # if `private`, it requires at least 2 NICs in the machine.
  network: shared or private
  # if the monitoring provider will use a IPv4 or IPv6 network
  ipv: 4 or 6

Chameleon Cloud

monitoring:
  type: tpg
  provider: chameleoncloud
  cluster: compute_cascadelake

G5K + FIT IoT LAB

Prometheus uses a pull model to scrape metrics from the telegraf agents. In this case, we do not need to create a firewall rule. IPv6 connection from Grid’5000 to IoT-LAB is allowed (the inverse is not true unless you open the firewall port, as presented earlier).

monitoring:
  type: tpg
  provider: g5k
  cluster: paravance
  network: shared
  ipv: 6

4. Set up a monitoring profile in FIT IoT LAB (energy consumption)

Next, we show how to set up a monitoring profile to monitor current, voltage, and power of FIT IoT LAB nodes (in this case, a8 and rpi3 nodes). You can manage the monitoring profiles in the dashboard through this link https://www.iot-lab.info/testbed/resources/monitoring.

monitoring_iotlab:
  profiles:
    - name: test_capture_a8
      archi: a8               # ['a8', 'custom']
      current: True           # [True, False]
      power: True             # [True, False]
      voltage: True           # [True, False]
      period: 8244            # [140, 204, 332, 588, 1100, 2116, 4156, 8244]
      average: 4              # [1, 4, 16, 64, 128, 256, 512, 1024]
    - name: test_capture_rpi
      archi: custom
      current: True
      power: True
      voltage: True
      period: 8244
      average: 4

5. Starting monitoring and saving captured data 

For Dstat, TIG, and TPG, the monitoring is started during the launch step of the workflow.yaml file or when the user executes the following command:

$ e2clab workflow /path/to/scenario/ launch

It allows capturing monitoring data before the start of each Service.

Energy monitoring in FIT IoT LAB starts after reservation.

All the monitoring data is saved in the /path/to/scenario_dir/monitoring-data/ directory. It is saved in the finalize step with the following command:

$ e2clab finalize /path/to/scenario_dir/

Besides saving the monitoring data, it also stops the monitoring services and agents started on each machine.

Try some examples 

We provide a few tutorials:

Monitoring

1. Set up Dstat

G5K, FIT IoT LAB, or Chameleon Cloud

2. Set up TIG stack: Telegraf/InfluxDB/Grafana

G5K

Chameleon Cloud

G5K + FIT IoT LAB

3. Set up TPG stack: Telegraf/Prometheus/Grafana

G5K

Chameleon Cloud

G5K + FIT IoT LAB

4. Set up a monitoring profile in FIT IoT LAB (energy consumption)

5. Starting monitoring and saving captured data

Try some examples

Monitoring 

1. Set up Dstat 

2. Set up TIG stack: Telegraf/InfluxDB/Grafana 

3. Set up TPG stack: Telegraf/Prometheus/Grafana 

5. Starting monitoring and saving captured data 

Try some examples 