Software Company, Microsoft has shown how it operates ‘Singularity”, its planet-scale distributed scheduling services for Artificial Intelligence workloads presented by 26 employees of the multinational company.
The company while publishing a paper titled: “Singularity: Planet-Scale, Preemptible and Elastic Scheduling of AI Workloads”, posited that the purpose of Singularity is to help it control costs by driving high utilization for deep learning workloads.
With the published paper giving technical details about the service’s effort, the structure of the service appears to be related to helping data scientists and AI practitioners build and experiment on their models on a Microsoft-provided distributed infrastructure service explicitly created for AI.
The newly published paper has notable authors listed in it, which included Azure Chief Technical Officer Mark Russinovich; Partner Architect Rimma Nehme, who worked on Azure Cosmos DB until moving to Azure to work on AI.
“At the heart of Singularity is a novel, workload-aware scheduler that can transparently pre-empt and elastically scale deep learning workloads to drive high utilization without impacting their correctness or performance, across a global fleet of accelerators (e.g., GPUs, FPGAs),” the paper noted.
The officials of Microsoft in addition to this enumerated plans to make field-programmable gate arrays (FPGA) available to customers as a service, and it would be recalled that in 2018, Microsoft went public about its “Project Brainwave” work, designed to provide fast AI processing in Azure.”
The company had then made a preview of Azure Machine Learning Hardware Accelerated Models powered by Brainwave in the cloud, with the move considered as the first step in making FPGA processing for AI workloads available to customers.
To ensure the availability of AI-powered FPGA available to customers, the company needed a ‘device proxy’ tool that “runs in its address space and has a one-to-one correspondence to a physical accelerator device. When a job worker initiates device APIs, they are intercepted and sent over the shared memory to the device proxy process that runs in a separate address space, and whose lifetime is decoupled from the lifetime of the worker process.”
The implication of this is that when this is achieved, it allows more additional jobs to be scheduled in a more efficient way, with thousands of servers in use for more time, while also enabling swift scaling, up or down, without disruption.
“Singularity achieves a significant breakthrough in scheduling deep learning workloads, converting niche features such as elasticity into mainstream, always-on features that the scheduler can rely on for implementing stringent SLAs,” the paper concludes.