****************************************** Provenance Capture: Edge-to-Cloud workflow ****************************************** In this tutorial, we show how to capture provenance data of a toy application (e.g., AI model training) executed on the Edge-to-Cloud continuum (G5K and FIT IoT LAB testbeds) . **The goal** is to show how provenance data capture can **help users to answer research questions** like: - What are the model hyperparameters that obtained an accuracy value above 90%? In this example **you will learn how to**: - Enable provenance data capture in E2Clab (Edge-to-Cloud: G5K and FIT IoT LAB testbeds) - Create a dataflow specification - Instrument the application code to decide what to capture - Query the database to **answer the research questions** Experiment Artifacts ==================== .. code-block:: bash $ cd ~/git/ $ git clone https://gitlab.inria.fr/E2Clab/examples/provenance-tutorial In this repository you will find: - the **E2Clab configuration files** such as layers_services.yaml, network.yaml, and workflow.yaml. - the **my-dataflow-specification.py** and the toy application **user-application.py**. Defining the Experimental Environment ===================================== Layers & Services Configuration ------------------------------- This configuration file presents the **layers** and **services** that compose this example. The **Master** (one machine ``quantity: 1`` in Grid'5000 ``environment: g5k``). The **Worker** (one A8-M3 device ``quantity: 1`` in FIT IoT LAB ``environment: iotlab``). To enable the **E2Clab provenance service**, we add the ``provenance:`` attribute with the following configuration: - ``provider: g5k`` to deploy in a G5K machine in the **gros** cluster ``cluster: gros``. - ``dataflow_spec: my-dataflow-specification.py`` to define the attributes and value types of the dataflow to create the provenance database tables. This file must be in the ``artifacts_dir`` directory you defined in the ``E2Clab command line``. - ``ipv: 6`` to allow FIT IoT LAB device to use its IPv6 network to send data - ``parallelism: 2`` to parallelize the provenance data translator (translates from ProvLight to DfAnalyzer data format) and the broker topic. Finally, we add ``roles: ['provenance']`` in the **Master** and **Worker** services to enable data capture on them (e.g., install ProvLight capture library and set up environment variables to enable the connection with the provenance service). .. literalinclude:: ./provenance_capture/layers_services.yaml :language: yaml :linenos: .. note:: We create a firewall rule on Grid'5000 to allow the **Worker** (FIT IoT LAB device) to send the captured data to the **E2Clab provenance service** deployed on G5K on ``port 1883`` (MQTT protocol). The toy application ------------------- To emulate the model training with various hyperparameters we implement the application below. The model input are the hyperparameters and the training output is the model performance such as the accuracy and the training time. .. literalinclude:: ./provenance_capture/user-application-baseline.py :language: python :linenos: Network Configuration --------------------- The file below presents the network configuration between the ``cloud`` and ``edge`` infrastructures ``delay: 28ms, loss: 0.1%, rate: 1gbit``. .. literalinclude:: ./provenance_capture/network.yaml :language: yaml :linenos: Workflow Configuration ---------------------- This configuration file presents the application workflow configuration. - The **Master** ``cloud.*`` and the **Worker** ``edge.*``: ``prepare`` copies from the local machine to the remote machine the application. ``launch`` executes the application. .. literalinclude:: ./provenance_capture/workflow.yaml :language: yaml :linenos: User-Defined Provenance Data Capture ------------------------------------ Next, we show how we used the ProvLight client library to instrument the application code to capture the model hyperparameters and the model performance results. The ``Workflow``, ``Task``, and ``Data`` classes are used to capture data. .. literalinclude:: ./provenance_capture/user-application.py :language: python :linenos: Running & Verifying Experiment Execution ======================================== Find below the commands to deploy this application and check its execution. .. code-block:: bash $ e2clab layers-services ~/git/provenance-tutorial/ ~/git/provenance-tutorial/artifacts/ The Provenance Service GUI is available at: - ``ssh -NL 22000:localhost:22000 gros-86.nancy.grid5000.fr`` .. _provenance_service_gui: .. figure:: ./provenance_capture/prov_gui.png :width: 100% :align: center Figure 1: Provenance Service GUI (DfAnalyzer) .. code-block:: bash $ e2clab workflow ~/git/provenance-tutorial/ prepare .. code-block:: bash $ e2clab workflow ~/git/provenance-tutorial/ launch Deployment Validation & Experiment Results ========================================== We can access the database as follows: .. code-block:: bash $ ssh root@gros-86.nancy.grid5000.fr $ docker exec -it dfanalyzer bash $ monetdb status $ mclient dataflow_analyzer With ``\d`` you can list all tables. .. _provenance_tables: .. figure:: ./provenance_capture/prov_tables.png :width: 50% :align: center Figure 2: Tables in the provenance database After model training on G5K node and FIT IoT LAB device, we can visualize the training results as presented in :ref:`provenance_in` and :ref:`provenance_out`. .. _provenance_in: .. figure:: ./provenance_capture/prov_in.png :width: 100% :align: center Figure 3: Model input (hyperparameters) .. _provenance_out: .. figure:: ./provenance_capture/prov_out.png :width: 100% :align: center Figure 4: Model output (accuracy and training time) After multiple model evaluations, thanks to provenance data capture during model training, users can easily answer the following research question: - What are the model hyperparameters that obtained an accuracy value above 90%? .. _provenance_rq: .. figure:: ./provenance_capture/prov_rq.png :width: 100% :align: center Figure 5: What are the model hyperparameters that obtained an accuracy value above 90%? Saving the Experiment Results ----------------------------- .. code-block:: bash $ e2clab finalize ~/git/provenance-tutorial/ The experiment results will be saved at: .. code-block:: bash $ ls ~/provenance-tutorial/20231120-102842/ $ layers_services-validate.yaml $ provenance-data/ # contains the 'provenance_database.sql' file. $ workflow-validate.out