Running COMPSs Applications on G5K (Docker deployment)

This section is intended to show how to execute COMPSs applications on Grid’5000. This example uses the official Docker image of COMPSs. This image comes with COMPSs installed and ready to use.

In this example you will learn how to (see Figure 1: COMPSs deployment):

Define the experimental environment:
- Layers and Services + Monitoring + Define the logic of your COMPSs Service;
- Network constraints;
- Workflow (tasks: prepare, launch, and finalize).
Deploy a Docker COMPSs Cluster: 1 Master + 3 Workers
Run COMPSs applications

../_images/compss-deployment.png — Figure 1: COMPSs deployment

Experiment Artifacts 

$ git clone https://gitlab.inria.fr/E2Clab/examples/compss
$ cd compss/
$ ls
artifacts     # Python scripts to generate the COMPSs configuration files (resources.xml and project.xml)
docker        # contains the COMPSs.py Service and layers_services.yaml, network.yaml, and workflow.yaml files

Defining the Experimental Environment 

Layers & Services Configuration

This configuration file presents the Layers and Services that compose this example. The COMPSs Service (a cluster of four nodes, quantity: 4) is composed of one Master and three Workers (please, see the COMPSs.py Service). The name of the COMPSs Service - name: COMPSs must be the same as the COMPSs.py file.

Note that we also added a TIG (Telegraf, InfluxDB, and Grafana) Monitoring stack monitoring. Adding roles: [monitoring] in the COMPSs Service we request monitoring of all machines.

environment:
  job_name: compss
  walltime: "00:59:00"
  g5k:
    job_type: ["deploy"]
    env_name: "debian11-x64-big"
    cluster: nova
monitoring:
  type: tig
  network: shared
layers:
- name: cloud
  services:
  - name: COMPSs
    quantity: 4
    roles: [monitoring]

Defining the logic of your COMPSs Service

Next, we define the logic of our COMPSs Service. It consists mainly in:

installing Docker and pulling the official image of COMPSs;
assigning machines for the Master and Workers;
creating an overlay network for standalone containers;
adding information about the Workers to the Master (see extra=extra_compss_master in COMPSs.py) to generate the resources.xml and project.xml files;
registering the Master and Worker as a subservice of the COMPSs Service.

Please, read the comments in the code for more details.

from e2clab.services import Service
from enoslib.api import populate_keys
from enoslib.objects import Roles
import enoslib as en


class COMPSs(Service):
    def deploy(self):
        # ssh keys for the root users must be generated and pushed to all nodes
        populate_keys(Roles({"compss": self.hosts}), ".", "id_rsa")

        # install docker
        registry_opts = dict(type="external", ip="docker-cache.grid5000.fr", port=80)
        self.deploy_docker(registry_opts=registry_opts)

        # Assign machines to COMPSs Master and Workers
        compss_master = "compss_master"
        compss_worker = "compss_worker"
        roles_compss_master = Roles({compss_master: [self.hosts[0]]})
        roles_compss_worker = Roles({compss_worker: self.hosts[1:len(self.hosts)]})

        # Create an overlay network for standalone containers
        # https://docs.docker.com/network/network-tutorial-overlay/#use-an-overlay-network-for-standalone-containers
        overlay_network = "compss-net"
        en.run("docker swarm leave --force", roles=self.hosts, on_error_continue=True)
        cmd_swarm_init = str(en.run("docker swarm init | awk '/docker swarm join --token/'", roles=roles_compss_master)[0].stdout).lstrip()
        en.run(f"{cmd_swarm_init}", roles=roles_compss_worker)
        en.run(f"docker network create --driver=overlay --attachable {overlay_network}", roles=roles_compss_master)

        # Start COMPSs Master container
        with en.actions(roles=roles_compss_master) as a:
            a.docker_container(
                name=compss_master,
                image="compss/compss",
                volumes="/root/.ssh:/root/.ssh",
                restart="yes",
                restart_policy="always",
                network_mode=overlay_network,
                interactive="yes",
                tty="yes",
                privileged="yes",
                default_host_ip="",
            )

        # Start COMPSs Worker containers
        workers = []
        extra_compss_worker = []
        for host in roles_compss_worker[compss_worker]:
            worker_id = f'{compss_worker}_{host.alias}'
            workers.append(worker_id)
            extra_compss_worker.append({'container_name': worker_id})
            with en.actions(roles=host) as a:
                a.docker_container(
                    name=worker_id,
                    image="compss/compss",
                    volumes="/root/.ssh:/root/.ssh",
                    restart="yes",
                    restart_policy="always",
                    network_mode=overlay_network,
                    interactive="yes",
                    tty="yes",
                    privileged="yes",
                    published_ports="43001-43002:43001-43002",
                    default_host_ip="",
                )

        # Users may add extra information to Services/sub-Services to access them in "workflow.yaml".
        # e.g, to access the container name as {{ _self.container_name }} in "workflow.yaml", you can do as follows:
        extra_compss_master = [{'container_name': compss_master, 'workers': ','.join(workers)}]  # COMPSs Master

        # Register the Service
        # register COMPSs Master Service
        self.register_service(_roles=roles_compss_master, sub_service="master", extra=extra_compss_master)
        # register COMPSs Worker Service
        self.register_service(_roles=roles_compss_worker, sub_service="worker", extra=extra_compss_worker)

        return self.service_extra_info, self.service_roles

Network Configuration

The file below presents the network configuration between machines in the COMPSs cluster. In this example, we defined a constraint between the Master and all Workers.

networks:
- src: cloud.compss.1.master.1
  dst: cloud.compss.1.worker.*
  delay: "2ms"
  rate: "10gbit"
  loss: 0.1

Workflow Configuration

This configuration file presents the application workflow configuration, they are:

Regarding just the COMPSs Master cloud.compss.*.master.*:
- in prepare we are copying the Python scripts to genetrate the COMPSs configuration files and then we generate such files (resources.xml and project.xml). Note that we used --workers {{ _self.workers }} since we added this information in the COMPSs.py Service. Finally, we copy both files to the container.
- in launch we run the COMPSs application.
Regarding all Workers in the COMPSs cluster cloud.compss.*.worker.*, in prepare we add the COMPSs applications.

- hosts: cloud.compss.*.master.*
  prepare:
    - debug:
        msg: "[{{ lookup('pipe','date +%Y-%m-%d-%H-%M-%S') }}] Adding applications to the COMPSs Master. My workers are: {{ _self.workers }}"
    - copy:
        src: "{{ working_dir }}/generate_project_xml_file.py"
        dest: "/tmp/generate_project_xml_file.py"
    - copy:
        src: "{{ working_dir }}/generate_resources_xml_file.py"
        dest: "/tmp/generate_resources_xml_file.py"
    - shell: docker exec -it -d {{ _self.container_name }} bash -c 'cd /root/ && git clone https://github.com/bsc-wdc/tutorial_apps.git'
    - shell: cd /tmp/ && python generate_project_xml_file.py --workers {{ _self.workers }} --install_dir /opt/COMPSs/ --working_dir /tmp/COMPSsWorker --user root --app_dir /root/ --path_to_new_file /tmp/default_project.xml
    - shell: cd /tmp/ && python generate_resources_xml_file.py --workers {{ _self.workers }} --computing_units 24 --memory_size 125 --min_port_nio 43001 --max_port_nio 43002 --path_to_new_file /tmp/default_resources.xml
    - shell: docker cp /tmp/default_project.xml {{ _self.container_name }}:/opt/COMPSs/Runtime/configuration/xml/projects/default_project.xml
    - shell: docker cp /tmp/default_resources.xml {{ _self.container_name }}:/opt/COMPSs/Runtime/configuration/xml/resources/default_resources.xml
  launch:
    - debug:
        msg: "Running COMPSs application: simple.py"
    - shell: docker exec -it {{ _self.container_name }} bash -c '/opt/COMPSs/Runtime/scripts/user/runcompss --project="/opt/COMPSs/Runtime/configuration/xml/projects/default_project.xml" --resources="/opt/COMPSs/Runtime/configuration/xml/resources/default_resources.xml" -d /root/tutorial_apps/python/simple/src/simple.py 2 2>&1'
- hosts: cloud.compss.*.worker.*
  prepare:
    - debug:
        msg: "[{{ lookup('pipe','date +%Y-%m-%d-%H-%M-%S') }}] Adding applications to the COMPSs Workers"
    - shell: docker exec -it -d {{ _self.container_name }} bash -c 'cd /root/ && git clone https://github.com/bsc-wdc/tutorial_apps.git'

Note

Besides prepare and launch, you could also use finalize to backup some data (e.g., experiment results). E2Clab first runs on all machines the prepare tasks. Then, the launch tasks on all machines, and finally the finalize tasks. Regarding the hosts order, it is top to down as defined by the users in the workflow.yaml file.

Running & Verifying Experiment Execution 

Find below the command to deploy the COMPSs cluster on G5K and run COMPSs applications.

Before starting:

make sure that your COMPSs.py file is located in: e2clab/e2clab/services/.
in the command bellow, compss/docker/ is the scenario directory (where the files layers_services.yaml, network.yaml, and workflow.yaml must be placed and where the results will be saved).
in the command bellow, compss/artifacts/ is the artifacts directory (where the Python scripts to generate the COMPSs configuration files must be placed).

$ e2clab deploy compss/docker/ compss/artifacts/

During application runtime, you may want to access the Grafana Web interface to visualize the moniotring data of all machines that compose the COMPSs cluster (please, check the compss/docker/layers_services-validate.yaml file to get the instructions to access Grafana).

After the application execution, you can check the log files as follows:

$ docker exec -it compss_master bash
$ cat /root/.COMPSs/simple.py_01/runtime.log
$ cat /root/.COMPSs/simple.py_01/jobs/job1_NEW.out

Deployment Validation & Experiment Results 

Find below the files generated after the execution of the experiment. It consists of validation files (layers_services-validate.yaml, workflow-validate.out, and network-validate/) and monitoring data influxdb-data.tar.gz.

$ ls compss/docker/20220325-152207/

layers_services-validate.yaml   # Mapping between layers and services with physical machines
workflow-validate.out           # Commands used to deploy the application (prepare, launch, and finalize)
network-validate/               # Network configuration for each physical machine
influxdb-data.tar.gz            # Monitoring data

Note

Providing a systematic methodology to define the experimental environment and providing access to the methodology artifacts (layers_services.yaml, network.yaml, workflow.yaml, and the COMPSs.py Service) leverages the experiment Repeatability, Replicability, and Reproducibility, see ACM Digital Library Terminology.

Running COMPSs Applications on G5K (Docker deployment)

Experiment Artifacts

Defining the Experimental Environment