Cloud Computing for HPC and Data Processing

January 22, 2014 VIEW ALL


The Burroughs 5500 was a large scale data processing machine

We speak with a lot of people that have been in the High Performance Computing industry for a long time.  One of the recurring conversations we have is the idea that cloud computing is not new.  The spirit of the conversation is that the concept of time-sharing computing resources has been around a long time, and this is absolutely true.  Prior to the decades of low-cost clusters, time was leased on multi-million dollar supercomputers to get commercial computational work done.  Our academic and government supercomputers serve large populations of remote scientists and researchers by offering computing time on clusters of machines to get their work done.

The National Institute of Standards and Technology (NIST) provides a formal definition, which reads:

“Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”

The concept of modern cloud computing on commodity systems was introduced with the maturing of virtualization technology.  This new way of time sharing server hardware, abstracting the OS from the hardware layer through a hypervisor, became synonymous with the term cloud computing.  Additionally, the technology itself was well suited for web applications that were lightweight and required elastic scalability.  It was the beginning of a major transformation of global IT.

However, it should be noted that the NIST definition of cloud computing does not necessitate that hardware resources be virtualized. This is particularly important for High Performance Computing and analytics applications that have been run on bare metal for many years in a batch model.  While virtualized infrastructures have certainly been useful for many users in HPC and analytics, many other long-time power users have refused to drink the cloud “Kool-Aid”, understanding that there are challenges associated with the virtualized model.

This does not mean that cloud solutions don’t work for traditional HPC and data processing.  It simply means some cloud solutions may work better than virtualized hardware.  NIST defines five “essential characteristics” for cloud computing:

On-demand self-service. A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider.

Broad network access. Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations).

Resource pooling. The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter). Examples of resources include storage, processing, memory, and network bandwidth.

Rapid elasticity. Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.

Measured service. Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service.

Given the above, evolving traditional non-virtualized HPC and data processing architectures into a cloud model simply require on-demand self-service and the ability to meter service.  Most of the other characteristics have been offered for years.

While cloud infrastructure on commodity servers is fueling a revolution in global IT largely thanks to virtualization, this technology does not usually provide the best results for HPC clouds.  On the other hand, the spirit and the letter of Cloud Computing is alive and well for analytics and data processing, with technologies such as our JARVICE platform paving the way for self-service and consumerization in HPC.  It’s the natural evolution of methodologies first introduced decades ago, becoming accessible and cost-effective for broader use cases.  We are all eagerly anticipating the wave of breakthroughs for humanity, made possible by HPC cloud computing.


Other Articles to Read

Contact us

How can we put the Nimbix Cloud to work for you?