Jonathan Stuart Ward and Adam Barker of the University of St. Andrews produced a report that takes a look at some of the problems facing High Performance Computing in traditional cloud computing environments. In this report they draw the distinctions between Grid, Cluster and Cloud Computing.
“Grid computing focuses on providing high performance and high scalability. Cloud computing alternatively focuses on delivering high scalability and low cost. Cloud services therefore aim to provide a far lower performance to price ratio and cannot surpass the performance of individual grid components. This is problematic for message passing workloads which rely heavily on high performance computers and fast interconnects.”
They do note that a few cloud providers have started to offer fast interconnects (10GB Ethernet), but still lack the preferred Infiniband interconnect. They also noted that another limitation of traditional cloud providers is the ability to scale up as well as out.
“With only a small number of cloud providers oﬀering high memory and high CPU VM instances this remains a crucial limitation.”
The report also highlights the advantages of dedicated in-house cluster computing.
“The principle behind cluster computing is simple: interconnect numerous compute nodes to provide a high performance system. Typically this is achieved by networking large numbers of x86 servers via a high speed Inﬁniband interconnect running a message passing system to facilitate job execution. Most clusters deploy some variation of GNU/Linux using the Message Passing Interface (MPI) or other interface… Clusters have a number of advantages over cloud and grid systems. Typically clusters are owned and operated by a single authority. This allows full access to the hardware and removes any need for federation. Full hardware access enables users to speciﬁcally modify both the cluster and the application to achieve optimum performance. Furthermore, the resource sharing which is crucial in cloud computing does not take place within a cluster. An application is executed with the full resources of the underlying hardware, not a speciﬁcally provisioned slice. Clusters can therefore achieve signiﬁcantly greater performance than the equivalent grid or cloud solution. The drawbacks of cluster computing are predominantly ﬁnancial. Clusters require substantial investment and substantial maintenance, these costs are often entirely prohibitive for smaller organisations…”
The report spends some additional time discussing the role of virtualization in traditional cloud computing environments and some of the challenges it presents.
“A VM is a software implementation of a computer system, running in isolation alongside other processes, which behaves as physical system. A single multi-processor server is capable of running several VMs, typically one per core (though cloud providers often oversell their CPUs). This allows for a single server to be eﬀectively used to capacity, reducing any unused CPU cycles and minimising wasted energy. Virtualizing a computer system reduces its management overhead and allows it to be moved between physical hosts and to be quickly instantiated or terminated.”
“The x86 architecture was not conceived as a platform for virtualization. The mechanisms which allow x86 based virtualization either require a heavily modiﬁed guest OS or utilise an additional instruction set provided by modern CPUs which handles the intercepting and redirecting traps and interrupts at the hardware level. Due to these levels of complexity there is deﬁnite performance penalty imparted through the use of virtualization. While this penalty has considerably decreased over recent years  it still results in a virtual machine delivering a percentage of the performance of an equivalent physical system. While some hypervisors are coming close to delivering near native CPU performance, IO performance is still lacking. IO performance in certain scenario’s suﬀers an 88% slowdown compared to the equivalent physical machine. VMs eﬀectively trade performance for maximum utilisation of physical resources. This is non-ideal for high performance applications…”
We at Nimbix agree wholeheartedly with the observations and analysis in the Ward-Barker report. The analysis presented is precisely why we built a High Performance Cloud service offering to specifically address the challenges and gaps discussed.
So what if you could have the best of all worlds? What if you could run your HPC applications on a service that offers:
- Up to 56Gb/s FDR Infiniband interconnections
- Accelerators (GPUs, DSPs, Xeon Phi, FPGAs, etc…)
- High Memory and High CPU resources
- Physical rather than Virtual computing resources
- Pre-optimized application specific environments
- Always On, Always listening for job submittals
- Flexible to the minute billing
- Secure “Behind the Firewall” processing
- An easy to use portal with a responsive UI for job submittal from several platforms (Desktop, Tablet, smart phone, etc…).
- Automated data-movement
- API and Command Line Interface for easy automation
Well you can. It is available now, today. Check out NACC (Nimbix Accelerated Compute Cloud) truly the best of all worlds.