DECICE

AI Scheduler for Mapping Storage Resources, GWDG and DECICE Logo, Visualisation of a server

AI Scheduler for Mapping Storage Resources

The AI scheduler for storage focuses on optimizing data placement, migration, and replication across multiple storage solutions in the compute continuum. The scheduler evaluates the characteristics and performance of different storage systems, including their capacities, redundancy levels, and access speeds.

In the dynamic landscape of heterogeneous compute environments, including High-Performance Computing (HPC), Cloud, and Edge computing, the significance of efficient storage scheduling cannot be overstated. The optimization of data transfer, enhancement of I/O performance, and utilization of bandwidth are pivotal elements in ensuring seamless operations across these diverse platforms. To address these challenges, an AI Storage Scheduler has been developed in the DECICE project.

The AI Storage Scheduler enhances Kubernetes by addressing its limitations in I/O and data locality awareness. Unlike the default Kubernetes scheduler, The AI Storage Scheduler recognizes the critical importance of data locality in storage scheduling. Kubernetes lacks a dedicated storage scheduling API, leading to inefficiencies in managing data locality. To overcome this challenge, we’ve integrated Longhorn into our system to provide robust and efficient block storage within the Kubernetes environment. Custom Resource Definitions (CRDs) of Longhorn has been utilized for the AI Storage Scheduler. These CRDs provide information about the persistent volumes’ locations, backup and snapshot data, and the pods that are using each persistent volume, including details on which nodes they are located on. Leveraging information extracted from CRDs of Longhorn, the AI Storage Scheduler makes storage decisions, ensuring optimal data locality and performance for AI workloads in Kubernetes clusters.

The AI Storage Scheduler is implemented to dynamically adapt to changes in workload and resource availability. It processes storage data to assess available and used storage percentages and applies fuzzy logic for efficient decision-making. The scheduler processes storage data from JSON, calculating the available and usage percentages of storage for each node. This data is critical for making informed decisions about storage management. The following steps provide a general idea of the process of fuzzy logic decision-making:

  • The scheduler applies fuzzy logic based on the processed data to determine actions such as expanding, maintaining or backing up storage.
  • Fuzzy rules are defined based on the available and usage percentages of storage.
  • Decisions are made for each node, considering the need for expansion, backup, or maintenance.

This implementation ensures that storage resources are optimally utilized, balancing performance needs against storage costs and ensuring high availability and fault tolerance.

Fuzzy Logic Implementation

The AI Storage Scheduler’s fuzzy logic decision-making process is detailed, with specific mention of the rules and membership functions used. The scheduler categorizes parameters like ”data size” and ”access frequency” into fuzzy sets and applies rules to decide on actions like expansion, backup, or maintenance based on the fuzzy logic output.

Fuzzy Rules and Actions: This approach allows the AI Scheduler to effectively manage diverse and uncertain data and system attributes, offering a sophisticated and adaptable resource management solution.

  • Fuzzy rules are formulated based on the storage available and usage percentages.
  • Actions such as ”expand”, ”maintain”, and ”backup” are determined by these rules.
  • The process includes defuzzification to translate fuzzy results into concrete actions for each storage node.

Applying Fuzzy Rules: The scheduler utilizes fuzzy rules of the form: IF [condition] THEN [action]. An example rule could be, “IF data size is large AND access frequency is high THEN prioritize local storage.”. These rules are derived from expert knowledge or data analysis, tailored to reflect optimal data management decisions.

Inference Engine: This component processes the fuzzy rules based on input data, evaluating the degree of satisfaction for each rule to inform the scheduling decision.

Defuzzification: The process of defuzzification translates fuzzy results into concrete actions or values, such as determining a priority level for data placement.

An Example of Decision-Making

Consider data with the following attributes: size = 500 GB, access frequency = ‘frequent’, and network bandwidth = ‘moderate’. The scheduler evaluates applicable rules such as, “IF data size is large AND access frequency is frequent THEN prioritize high-speed storage.” The inference engine processes these rules and, through defuzzification, concludes on a specific storage choice or priority level, optimally aligning with the given data attributes and system conditions.

Conclusion

The fuzzy logic-based AI scheduler effectively manages diverse and uncertain data and system attributes, offering a flexible and sophisticated approach to resource management. By leveraging fuzzy logic, the scheduler navigates the complexities of diverse information and system conditions, making it adaptable to dynamic environments. This ensures optimal allocation and utilization of resources, contributing to improved overall system performance and responsiveness.

 

Author: Mirac Aydin

 

Links

GWDG:                  https://gwdg.de/

UGOE:                   https://www.uni-goettingen.de/

 

Keywords

Storage, AI Scheduler, Fuzzy Logic, DECICE, Kubernetes, Data Locality, I/O Optimization
Spread the love
back to top icon