DECICE

Distributed Application Setup for DECICE Tutorial at ISC

DECICE focuses on complex applications that span the compute continuum, encompassing edge, cloud, and HPC resources. Using container technology to package application code along with its dependencies into portable units simplifies the deployment of applications in a heterogeneous computing continuum. At ISC 2025, a tutorial organized by DECICE focused on containerizing and orchestrating distributed applications across the compute continuum. In this blog, we share how we set up an edge infrastructure, integrated it with the cloud for orchestration, and enabled participants to deploy distributed applications.

Application overview

During the hands-on session, participants deployed a speech-to-text application composed of machine learning–based transcription and inference services. The application captured speech, transcribed it to text, used an LLM to perform inference on the text, and displayed the result. Transcription was executed at the edge, while inference was offloaded to the cloud. Although both tasks could run at the edge, the larger models required for accurate inference exceeded the capacity of the edge nodes. Alternatively, streaming raw audio to the cloud was possible, but it was bandwidth-intensive. To address this, we used an efficient approach in which transcription was performed at the edge, the resulting text was sent to the cloud via lightweight protocols such as MQTT, and inference was performed in the cloud. In addition to familiarizing participants with basic containerization and orchestration, the chosen application also exposed them to ML deployments and techniques for accessing hardware devices such as microphones from within containers.

Infrastructure

For running speech transcription at the edge, we chose a Raspberry Pi 5 device due to its portability and low cost, while providing sufficient compute power to run lightweight containers. We attached a microphone and a USB display to the Raspberry Pi for recording speech and displaying the output, respectively. The USB display was flashed with a custom firmware that enables it to accept serial commands. The cloud component was powered by a Kubernetes instance in Huawei’s HAICGU cluster. The Raspberry Pi edge nodes were integrated with this Kubernetes setup using KubeEdge, a framework for extending Kubernetes orchestration capability to remote edge nodes. We used the pi-gen tool to create an image for the Raspberry Pi. We forked the tool’s repository, added the necessary scripts to download, install, and configure KubeEdge, and then prepared the boot image for the Raspberry Pi. Upon booting the Raspberry Pi with this image, it automatically runs KubeEdge and registers itself as a node in the HAICGU Kubernetes cluster. This setup enabled the participants to interact with the Kubernetes at the cloud and deploy containers across the cloud and the edge nodes. The cloud login node had Podman to provide the participants with the ability to build images and push them to a private container registry deployed by the tutorial team.

Accessing the cluster

The participants needed access to the Kubernetes API in the cloud for submitting the workloads and deploying the application in the continuum. For security reasons, the Kubernetes API server was not exposed to the internet, and the participants had to ssh into the HAICGU cluster login node to access the API server. The challenge in this approach is with configuring the ssh clients of each participant to access the cluster. To simplify the process, the tutorial team deployed a browser based terminal solution using wetty. A landing page was deployed at HLRS, which gave participants access to a browser–based terminal session on an HLRS VM. This terminal session was preconfigured with SSH client settings, enabling participants to easily connect to the HAICGU cluster login node.

Software stack

To transcribe speech at the edge, participants deployed a container with whisper.cpp library and sound libraries on a Raspberry Pi with an attached microphone. Whisper.cpp was chosen for its lightweight, efficient implementation of the Whisper model, enabling offline transcription on resource-constrained edge devices. For inference, we used Ollama, an open-source tool for running large language models (LLMs). We went with Ollama because of its ease of deployment compared to other inference tools such as vLLM. A container running the Ollama server was deployed on the cloud to receive the text generated at the edge and return the inference results. To display the inference output at the edge, a separate container was deployed on the edge device, receiving results from the cloud and sending serial commands to the USB display. The different application components coordinated with each other by subscribing and publishing messages to a containerized MQTT broker in the cloud. We provided participants with Dockerfiles to build the container images on the cloud infrastructure and push them to a dedicated tutorial registry. Participants were also given manifests to deploy these components on the appropriate nodes using Kubernetes. Refer to this readme and the contents here for more information.

This multi-component, containerized application is a simple representation of the use cases addressed by the DECICE project. The scheduler and the framework developed within the DECICE project will enhance such deployments by optimizing the component placement across the compute continuum by taking into account the various factors like latency, compute power, and power consumption. This setup not only demonstrated the technical feasibility of distributed deployment across the compute continuum using containers but also highlighted the practical considerations that DECICE aims to address.

Author(s): Aadesh Baskar, University of Stuttgart

Key words: #Compute Continuum #Containerization #Telecommunication & Orchestration #Speech-to-Text Application #Scalability & Optimization

Author: Christian Racca, TOP-IX

Spread the love
back to top icon