Accelerating Innovation with the Convergence of HPC and AI

August 26, 2021

Artificial Intelligence (AI) and high-performance computing (HPC) are modern terms overloaded with content. Understanding what they are and how they complement each other is essential in understanding why AI and HPC have started to converge in the past decade. 

It is truly remarkable that the combination of hardware and machines, and software and algorithms, can deliver similar or better data interpretation results than humans. Even more impressive is the ability of AI models to deliver those results by providing the machines with sample data in what’s called a training set. The increased availability of data, architectural advancements, and increased computational capabilities has resulted in the accelerated AI expansion in all types of applications, especially science and technology. This recent explosion in AI applications has come at a high computational cost. 

HPC is well-positioned to address the computational needs of AI by running vast banks of computers for days or weeks to train AI’s powerful machine-learning algorithms. AI and HPC mutually complement each other and have forged a tight fellowship.

The technical details behind why AI needs HPC 

AI is associated with machine learning and its subset, deep learning, technology. Machine learning (ML) describes the process of a machine (computer) programming itself from a set of data while deep learning describes certain types of artificial neural networks (ANNs) that may have hidden computational layers between the input and the output neuron layers. The “weights” or parameters of ANNs that steer the output results with minimal error, host the information that ANNs use to perform these technical advances such as voice recognition, tumor identification, temperature change detection, or provide accurate inferences about the truth.

ML involves a training process before an inference is to be made. The training process, a big data problem, requires a training data set with a low error level. ANNs adjust their parameters (weights) during this training process and use the trained model to make accurate predictions, structure data, or identify the best compounds. 

While inferencing is done quickly, making it easy to incorporate trained models in devices such as IoT devices, training an ANN requires a lot of computational power and time.

ANN training requires adjusting the model weights by performing many inference operations in a highly parallel fashion for a fixed training set. During each step of the optimization process, the ANN’s weights are adjusted for minimal output error. Achieving this requires troves of computers working in parallel and with efficient algorithms. 

According to a report published in 2018 by OpenAI, a San Francisco research lab, the amount of computational power used to train AI models in the modern era of AI, after 2012, has doubled every 3.4 months. If we break the two phases of the AI revolution into the first era, from 1959 to 2012, and the modern era, post-2012, it is easy to see the inflection point, as shown below:

Compute power required to train AI models (log scale)
Compute power required to train AI models (log scale)
Source: Adapted from

HPC can meet AI’s need for computational power

The refinement required to perfect an AI algorithm can be especially computationally intensive. This requires large computer farms working in parallel and using efficient algorithms to minimize computational time to train a network. HPC vendors now have the hardware to support the AI ecosystem.

HPC providers that cater to AI applications are already focusing on balance ratios to support run-anywhere computational platforms. Currently, the hardware is memory bandwidth bound. Therefore, the flops per memory bandwidth balance ratio become critical for AI model training. In other words, balance ratios simply mean that one must improve on the existing hardware that already works. The HPC hardware will deliver optimal sustained performance if there is sufficient computational capability to support the available memory and cache bandwidth.

The future of HPC and its convergence with AI training belongs to specialized hardware

On the CPU front, Intel, AMD, and other silicon manufacturers are feverishly working to supply scalable processors that deliver increased performance for a range of HPC loads. Increasingly, HPC providers include AI deep learning models in their portfolio along with traditional computer simulation and data storage capabilities.

FPGAs, natural candidates for high-speed and low latency inference operations, have the advantage of being programmable with the logic necessary to perform the inference operation at minimal mathematical precision. This is referred to as a persistent neural network and more HPC companies are implementing them.

Self-learning, neuromorphic chips such as Intel’s Loihi that use asynchronous spiking instead of deep learning ANN activation functions draw their inspiration from how neurons in the brain learn based on the feedback they receive from the environment. These chips mimic the basic mechanism of the brain and therefore require lower computational power for the same or better performance. Neuromorphic chips integrated into the HPC architecture have the potential to disrupt and leapfrog existing technologies.

Gartner predicts that through 2023, AI will be one of the top workloads that drive enterprise infrastructure decisions. Many enterprises are turning to HPC providers to ensure they have access to infrastructure that dynamically scales and keeps up with performance requirements. This paves the way for researchers and scientists to use the huge data at their disposal to create intelligent algorithms to improve how services and products, such as breast cancer screening, are delivered and consumed.

Subscribe to our quarterly email newsletter to get resources like this in your inbox!