The cloud computing industry has grown massively over the last decade and with that new areas of application have arisen. Some areas require specialized hardware, which needs to be placed in locations close to the user. User requirements such as ultra-low latency, security and location awareness are becoming more and more common, for example, in Smart Cities, industrial automation and data analytics. Modern cloud applications have also become more complex as they usually run on a distributed computer system, split up into components that must run with high availability.
Unifying such diverse systems into centrally controlled compute clusters and providing sophisticated scheduling decisions across them are two major challenges in this field. Scheduling decisions for a cluster consisting of cloud and edge nodes must consider unique characteristics such as variability in node and network capacity. The common solution for orchestrating large clusters is Kubernetes, however, it is designed for reliable homogeneous clusters. Many applications and extensions are available for Kubernetes. Unfortunately, none of them accounts for optimization of both performance and energy or addresses data and job locality.
12/2022 to 11/2025
Research & Innovation Action
AI-based, open and portable cloud management
DECICE aims to develop an AI-based, open and portable cloud management framework for automatic and adaptive optimization and deployment of applications in a federated infrastructure, including computing from the very large (e.g., HPC systems) to the very small (e.g., IoT sensors connected on the edge).
Working at such vastly different scales requires an intelligent management plane with advanced capabilities that allow it to proactively adjust workloads within the system based on their needs, such as latency, compute power and power consumption. Therefore, we envision an AI-model, which can use a digital twin of the resources available, to make real-time scheduling decisions based on telemetry data from the resources.
The DECICE framework will be able to dynamically balance different workloads, optimize the throughput and latency of the system resources (compute, storage, and network) regarding performance and energy efficiency and quickly adapt to changing conditions. The framework also gives the necessary tools and interfaces for the administrators and deployment experts to interface with all the infrastructure components and control them to achieve the desired result.
Open standard APIs
The integration of the DECICE framework with orchestration systems will be done through open standard APIs to make it portable, modular and extensible. The DECICE framework will be evaluated through established use cases.
- D2.1 Specification of the Optimization Scope (Deliverable not available yet)
- D2.2 Digital Twin (Deliverable not available yet)
- D2.3 AI-Scheduler Prototypes for Storage and Compute (Deliverable not available yet)
- D2.4 Integrated AI-Scheduler Prototype (Deliverable not available yet)
- D2.5 Final Scheduler and Digital Twin (Deliverable not available yet)
- D3.1 Synthetic Test Environment (Deliverable not available yet)
- D3.2 Final Architecture and Interfaces (Deliverable not available yet)
- D3.3 Final Implementation (Deliverable not available yet)
- D3.4 Security and Trustworthiness (Deliverable not available yet)
- D5.1 Use Case Rquirements (Deliverable not available yet)
- D5.2 Development Environment Specification (Deliverable not available yet)
- D5.3 Project Development Environment Deployed for Phase 1 and 2 (Deliverable not available yet)
- D5.4 Project Development Environment Deployed for Phase 3 (Deliverable not available yet)
- D5.5 Performance Evaluation Report (Deliverable not available yet)