********** Monitoring ********** .. contents:: :depth: 2 There are 3 main ways to monitor the computing resources during experiment execution, they are: - Dstat - TIG stack: Telegraf/InfluxDB/Grafana - TPG stack: Telegraf/Prometheus/Grafana To enable monitoring in E2Clab, users have to configure the ``layers_services.yaml`` file as follows: - Define the monitoring by adding the ``monitoring`` attribute (more details in the next sections). - Add ``roles: [monitoring]`` on each ``Service`` the user wants to monitor. In addition, you can monitor energy consumption: - Monitoring profile in FIT IoT LAB To enable monitoring of FIT IoT LAB nodes in E2Clab, users have to configure the ``layers_services.yaml`` file as follows: - Define the monitoring profile by adding the ``monitoring_iotlab`` attribute (for more details refer to Section **Set up a monitoring profile in FIT IoT LAB**). 1. Set up Dstat =============== G5K, FIT IoT LAB, or Chameleon Cloud ------------------------------------- Set up dstat is very simple (see example below). .. code-block:: yaml :linenos: monitoring: type: dstat 2. Set up TIG stack: Telegraf/InfluxDB/Grafana ============================================== It requires a ``monitoring provider``. This provider is a dedicated machine hosting ``InfluxDB`` and ``Grafana``. For visualizing the monitoring data in Grafana you have to follow the instructions in the ``layers_services-validate.yaml`` (file located in the experiment directory). After deployed, the ``monitoring provider`` will be available at ``http://paravance-10.rennes.grid5000.fr:3000``. You can access it from your local machine as follows ``ssh -NL 3000:localhost:3000 paravance-10.rennes.grid5000.fr``. You can use ``admin`` for the username and password. G5K --- .. code-block:: yaml :linenos: monitoring: type: tig provider: g5k # you can use `cluster` or `servers` to deploy the monitoring provider cluster: paravance servers: ["paravance-10.rennes.grid5000.fr"] # if `private`, a new network is created for the monitoring traffic. # if `private`, it requires at least 2 NICs in the machine. network: shared or private # if the monitoring provider will use a IPv4 or IPv6 network ipv: 4 or 6 # you can provide a config file (must be in `artifacts_dir`) for the telegraf agents. agent_conf: telegraf.conf.j2 Chameleon Cloud --------------- .. code-block:: yaml :linenos: monitoring: type: tig provider: chameleoncloud cluster: compute_cascadelake G5K + FIT IoT LAB ----------------- For G5K + FIT IoT LAB, a firewall rule is needed. The reconfigurable Firewall API resource URLs are of the form ``https://api.grid5000.fr/stable/sites//firewall/`` where ```` and ```` are the Grid'5000 site and the OAR job number for which one requests openings. For instance: ``https://api.grid5000.fr/stable/sites/rennes/firewall/1961803``. In the example below, we open a firewall rule for the ``monitoring_service`` (the monitoring provider) on port ``8086`` (InfluxDB). It allows the telegraf agents on FIT IoT LAB nodes to send their data to the monitoring service on G5K. .. code-block:: yaml :linenos: environment: g5k: cluster: paravance job_type: ["allow_classic_ssh"] firewall_rules: - services: ["monitoring_service"] ports: [8086] iotlab: cluster: grenoble monitoring: type: tig provider: g5k cluster: paravance network: shared ipv: 6 3. Set up TPG stack: Telegraf/Prometheus/Grafana ================================================ G5K --- .. code-block:: yaml :linenos: monitoring: type: tpg provider: g5k # you can use `cluster` or `servers` to deploy the monitoring provider cluster: paravance servers: ["paravance-10.rennes.grid5000.fr"] # if `private`, a new network is created for the monitoring traffic. # if `private`, it requires at least 2 NICs in the machine. network: shared or private # if the monitoring provider will use a IPv4 or IPv6 network ipv: 4 or 6 Chameleon Cloud --------------- .. code-block:: yaml :linenos: monitoring: type: tpg provider: chameleoncloud cluster: compute_cascadelake G5K + FIT IoT LAB ----------------- Prometheus uses a pull model to scrape metrics from the telegraf agents. In this case, we do not need to create a firewall rule. IPv6 connection from Grid'5000 to IoT-LAB is allowed (the inverse is not true unless you open the firewall port, as presented earlier). .. code-block:: yaml :linenos: monitoring: type: tpg provider: g5k cluster: paravance network: shared ipv: 6 4. Set up a monitoring profile in FIT IoT LAB (energy consumption) ================================================================== Next, we show how to set up a monitoring profile to monitor ``current``, ``voltage``, and ``power`` of FIT IoT LAB nodes (in this case, ``a8`` and ``rpi3`` nodes). You can manage the monitoring profiles in the dashboard through this link ``https://www.iot-lab.info/testbed/resources/monitoring``. .. code-block:: yaml :linenos: monitoring_iotlab: profiles: - name: test_capture_a8 archi: a8 # ['a8', 'custom'] current: True # [True, False] power: True # [True, False] voltage: True # [True, False] period: 8244 # [140, 204, 332, 588, 1100, 2116, 4156, 8244] average: 4 # [1, 4, 16, 64, 128, 256, 512, 1024] - name: test_capture_rpi archi: custom current: True power: True voltage: True period: 8244 average: 4 5. Starting monitoring and saving captured data =============================================== For **Dstat, TIG, and TPG**, the monitoring is started during the ``launch`` step of the ``workflow.yaml`` file or when the user executes the following command: .. code-block:: bash $ e2clab workflow /path/to/scenario/ launch It allows capturing monitoring data before the start of each ``Service``. **Energy monitoring in FIT IoT LAB** starts after reservation. All the monitoring data is saved in the ``/path/to/scenario_dir/monitoring-data/`` directory. It is saved in the ``finalize`` step with the following command: .. code-block:: bash $ e2clab finalize /path/to/scenario_dir/ Besides saving the monitoring data, it also stops the monitoring services and agents started on each machine. Try some examples ================= We provide a few tutorials: - `TPG stack on G5K + FIT IoT LAB <../examples/monitoring-tpg-g5k-iotlab.html>`_ - `Dstat on G5K + FIT IoT LAB (energy consumption) <../examples/monitoring-energy-dstat-g5k-iotlab.html>`_ - `TIG stack on G5K <../examples/monitoring-tig-g5k.html>`_ - `TIG stack on Chameleon <../examples/monitoring-tig-chameleon.html>`_