AIRR is an AI-based Resource Recommender for High-Performance Computing (HPC) applications in Cloud environments. AIRR was developed as a generalizable approach to the problem of finding the optimal Cloud instance type per HPC-application job, without the need for extensive modeling or preexistent data.
The system treats the challenge of resource allocation as a contextual multi-armed bandit problem. As users submit jobs to be executed in the cloud, AIRR will use these jobs as opportunities to gain insight into the relationship between application, job parameter values, and hardware, which will improve the quality of future recommendations.
AIRR has been initially validated on a mix of four HPC applications and eight, diverse Amazon EC2 instance types. The system was shown to converge towards an optimal solution even after a small number of job executions. It effectively explored different options, which further improved its recommendation choices over time.
AIRR was designed with small labs and individual researchers in mind, who want to run their own HPC applications multiple times, with different inputs. We aim to service these users by:
However, AIRR can in principle, be also used on the side of Cloud-service providers for optimizing Cloud-resource usage across a rich mix of user applications.
AIRR makes use of state-of-the-art AI technology to approximate the relationship between application, hardware, and job parameters. It combines this with time-tested algorithms to decide when it is time to exploit its knowledge and when to explore further, so as to improve future recommendations. This technology includes:
The main envisioned user base for AIRR are small labs or individual users that employ their own HPC applications, which they regularly execute with different inputs. However, we believe that AIRR can also be of interest to Cloud hyperscalers. As AIRR recommends optimal choices of instance types for specific HPC jobs, it can help the Cloud providers improve their infrastructure utilization.