Share this:


HPC Business Continuity/Disaster Recovery

Even though High Performance Computing Clusters are often used to perform processing that is critical to an organization they are often over looked when IT is developing a Business Continuity/Disaster Recovery plan.  If they are included in the plan the focus is usually more on the backing up and recovery of the large amounts of data often associated with these systems.

While that is a very important part of any Business Continuity/Disaster Recovery plan, it only touches on one piece (Disaster Recovery).  Because of the unique hardware requirements of most HPC clusters and the cost associated with maintaining a cold, warm, or hot redundant system in an offsite location the actual Business Continuity side is often overlooked or not well addressed.  So while the data may be protected, the ability to process it may not.

We all read the horror stories after Katrina/Ike and Super Storm Sandy.  Flooded Datacenters, “Bucket Brigades” desperately trying to get diesel up 17 flights of stairs to refuel generators, soaring data center temperatures, wide spread network and power outages.  The factor that a lot of companies were not prepared for was that in many cases it took months to get some of these facilities back in working order.

Some IT groups have looked to popular “Cloud” Infrastructure-as-a-Service hosting providers as a solution to help meet this processing need, but these efforts are often hampered by the fact that these providers offer underperforming commodity hardware and highly leverage virtualization.  You can read about the different challenges of commodity Cloud Computing in my past blog “High Performance Cloud: Best of All Worlds (Grid, Cluster and Cloud Computing)

So what is the alternative?

Nimbix has recently announced a new Platform-as-a-Service for HPC called JARVICEJARVICE allows organizations to take the applications they are using in their HPC cluster today and install, configure and test them in a NAE (Nimbix Application Environment).  These NAEs do not use a hypervisor and run at bare-metal speed.  Once the application is built within the NAE it can be promoted to Nimbix’s “On-demand” HPC batch processing service called NACC (Nimbix Accelerated Compute Cloud) as a private application and run on NACC’s scalable HPC hardware in an affordable pay per use model.

NACC offers:

  • Up to 56Gb/s FDR Infiniband interconnections
  • Accelerators (GPUs, DSPs, Xeon Phi, FPGAs, etc…)
  • High Memory and High CPU resources
  • Physical rather than Virtual computing resources

Some of the other upsides of building JARVICE and NACC into your Business Continuity plan is that it can be leveraged for additional capacity during peak times when your existing cluster maybe overwhelmed, there is no cost for standing infrastructure and NACC is always listening for job requests.  What’s more, Nimbix has even automated the data movement to take that problem out of the equation.

If you are interested in learning more please sign up for early access to JARVICE or contact our sales team +1.866.307.0819.