1. E2Clab (Edge-to-Cloud lab)

Distributed digital infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing to execute complex application workflows from IoT Edge devices to the HPC Cloud (aka the Computing Continuum, the Digital Continuum, or the Transcontinuum). Understanding end-to-end performance in such a complex continuum is challenging. This breaks down to conciliating many, typically contradicting application requirements and constraints with low-level infrastructure design choices. One important challenge is to accurately reproduce relevant behaviors of a given application workflow and representative settings of the physical infrastructure underlying this complex continuum, see Figure 1: E2Clab goals.

_images/research-gap.png

Figure 1: E2Clab goals

2. E2Clab methodology

E2Clab is an open source framework that implements a rigorous methodology that provides guidelines to move from real-life application workflows to representative settings of the physical infrastructure underlying this application. The goal is to allow users to accurately reproduce the relevant application behaviors and therefore understand end-to-end performance. Understanding end-to-end performance means rigorously mapping the scenario characteristics to the experimental environment, identifying and controlling the relevant configuration parameters of applications and system components, and defining the relevant performance metrics. Our methodology (see Figure 2: E2Clab methodology) consists of three main processes:

2.1. Provide access to the artifacts

This means that all artifacts used in the experiment, such as datasets, algorithms, software, and config files, must be available in a public and safe repository.

2.2. Define the experimental environment

We split this process into three sub-processes:

  • Define layers and services: this methodology is centered on the concept of Layers & Services. Layers can be used to group services and define a hierarchy between them. Layers can also represent the geographic distribution of services in the scenario. In the context of the Computing Continuum, these layers refer to the Edge, Fog, and Cloud. Services: represent any component that provides a specific functionality or action in the scenario workflow. In the context of the Computing Continuum, Services refer to data producers, ingestion systems, AI, and BDA frameworks. Users may specify a Cloud layer and assign a Big Data Analytics framework service to it.

  • Define network: consists in defining the communication constraints between Layers and Services. For instance, users may specify the bandwidth between IoT devices and a gateway in the Fog layer.

  • Define workflow: defines the execution logic and interconnections between Layers and Services. For instance, users may specify that a group of IoT devices should connect to a specific gateway in the Fog.

2.3. Provide access to the results

All the files generated during the execution of the experiments, such as monitoring data, metric files, among others, that means, all generated data required to analyze the experiments, must be in a public and safe repository.

_images/E2Clab-methodology.png

Figure 2: E2Clab methodology

3. E2Clab architecture

E2Clab (sits on top of EnOSlib) allows researchers to reproduce in a representative way the application behavior in a controlled environment for extensive experiments and, therefore, to understand the end-to-end performance of applications by correlating results to the parameter settings. E2Clab provides a rigorous approach to answering questions like:

  • How to identify infrastructure bottlenecks?

  • Which system parameters and infrastructure configurations impact performance and how?

High-level features provided by E2Clab are (see Figure 3: E2Clab architecture):

  • Reproducible Experiments: Supports repeatability, replicability and reproducibility.

  • Mapping: Application parts (Edge, Fog and Cloud/HPC) and physical testbed.

  • Variation & Scaling: Experiment variation and transparent scaling of scenarios.

  • Network Emulation: Edge-to-Cloud communication constraints.

  • Experiment Management: Deployment, execution and monitoring (e.g. Grid’5000, Chameleon Cloud and Edge, and FIT IoT LAB).

  • Optimization: configuration search of application workflows.

  • Provenance: user-defined provenance data capture of Edge-to-Cloud workflows.

_images/E2Clab-architecture.png

Figure 3: E2Clab architecture