When evaluating options for cloud based clusters for use in HPC applications, costs are often a major consideration. For the occasional HPC processing task, preparing a cluster from the instance up (not always a trivial task) can be a cost effective way to solve those compute problems. But what if the HPC processing task is more than occasional? What if it is part of your ongoing business process? At what point does it make sense to consider deployment alternatives?
To take a more quantitative view let’s start by looking at inputs and cost components of a deployment:
- Average walltime for HPC Job on fixed cluster size
- Jobs required per month
- Software licensing costs (if applicable)
- Machine cost (purchased)
The costs may vary from organization to organization depending on datacenter location, cost of electricity, type of cooling deployed, number of staff to support, etc., but in any deployment scenario, understanding these inputs and factors are important.
From the cloud perspective, this can be fairly straightforward, since all costs are abstracted to an hourly or monthly rate. Let’s take a theoretical example of an application that runs on a 12 node cluster requiring 16 CPU cores per node and 3-4GB RAM per core. Let’s assume that the application has an average run time of 5 hours.
The simplest cost to calculate is a single run using on-demand cloud resources. Let’s assume that the hourly rate for a compute instance with the above attributes (excluding data transfer and any cluster creation setup costs) is $2.20/hour. This means the total hourly cost for the cluster is $2.20/hr x 12 nodes = $26.40/hr. A single job run would cost $132.00. Keeping the analysis simple, if a user only needed to run 1 job per month, using an on-demand cluster is likely the way to go. But what if s/he needed to run more than one job per month, or actually install a workload manager/ job scheduler and enable multi-user job submissions? What do costs look like if some jobs fail?
Considering the other extreme, let’s suppose the cluster was needed for a month. The total cost to operate the on-demand cluster becomes $26.40/hr x 720 hours = $19,000 per month…. a pretty expensive endeavor.
Turning to dedicated HPC clouds for a moment, let’s assume that to rent the same type of cluster on a monthly basis was $7200.00 per month. In the above scenario, the break even point between the two deployment approaches is at 272 hours of cluster usage. If the HPC processing tasks requires more than this, dedicated is the way to go.
While the above example is simplistic, it does highlight a quantitative approach to selecting cost-optimized HPC cloud deployment models. Other factors can weigh in; factors like software license management, user location, cluster management support, data storage, node-attributes, interconnect, security, and walltime variance between virtualized and bare-metal clusters. Ultimately, these factors must be reviewed by the consumer and the best, most efficient path selected.