I wanted to share a few tips on how you can leverage the NACC API to query for your job status. Both job_name and job_number are optional, but neither, either or both can be specified for your query. Should you not supply either job_name or job_number, then the API will simply return the status of your last five jobs.
To try this out, post your requests to the following URL https://api.nimbix.net:4081/nimbix/nacc_jstatus.
Just in case you forgot your API key, you can always grab this via the NACC portal.
There’s a number of interesting possibilities you can do here once you have this information, you can integrate with an existing dashboard / DevOps tool-set, leverage this info for existing notification / eventing systems you have, or for internal billing and accounting systems.
What are the ways you’re leveraging this information, we’d love to hear from you?
I’m back in the blogging seat today and while I’m here, I wanted to share some our recent UI changes, along with a preview into some new features coming to NACC.
Let’s dive right in and take look at some of these new UI features we recently launched.
Dashboard
The “dashboard” is your single window view into some of following areas, job status, notifications, key job stats, and the ability to quickly launch your most frequent application workflows. You can see on my dashboard that BWA is my most used app, mostly from test cases. BWA really shows the power of NACC in terms of flexibility, in creating multi-step workflows, along with the compute power of our accelerated co-processors.
Task Builder
Our task builder allows for easy to use workflow creation. In my example below, I’m creating a multi-step paired-end read for BWA. Here I can quickly and easily provide my reference database, either provided by Nimbix (already pre-indexed) and or a number of other third party sources. Once I’ve selected my reference database, which could be plant, animal, or human (martian or gremlin anyone?), I’m then able to provide my read sequence inputs and any optional parameters associated with BWA. Lastly as part of our task builder I can provide details on where my output needs to go, again this could be a Nimbix location or any one of our third party locations.
Data Mover
Data movement is a critical part of workflow and data processing. Our v2.1 release allows you to seamlessly move your inputs and outputs to the location of your choosing for both pre and post processing. Currently we support automated data movement for S3, Dropbox, Globus, and any SFTP location. Our automated data movers give our users significant flexibility when it comes to building, and submitting jobs because they’re not having to wait for data movement to occur and or jungle any dependancies within a workflow due to missing inputs / outputs.
Don’t forget that all of the above features can also be leveraged using our API!
More and More Features
You’re asking… and we’re listening and building! We’re heads down on building amazing new features for you. Some of these new features will include true integrated remote visualization. Just think… from a post-processing stand point, you won’t have to move your data back out of the cloud. You’ll simply be able to interact in real-time with your outputs, models, etc. within the Nimbix cloud. We’re also heads down on offering our users the ability to select their desired interconnect at run-time. This means you’ll be able to select either gigE, 10gigE, or Infiniband for your given application. You won’t have to worry about the plumbing of the fabric, just tell our task builder and we’ll make it happen, in real-time. Lastly, we’re working on building and delivering a true DCIM product for our private cloud / colocation users. This will allow for a ton of flexibility around space, power, and cooling management.
The old blog seat is getting a bit uncomfortable now. I’ve shown you a few things about v2.1 and shared a bit of our product roadmap. I’d like to leave you with a small video of NACC in action. If a picture is worth a 1000 words, this video should be worth at least 10′s of thousands of words.
Most organizations that have managed large batch processing resources, or HPC clusters understand the economics of computing pretty well. The difficulty has always been in capacity planning, software application management/deployment, and trying to keep up with changing compute, storage, and networking technologies.
When you add public cloud computing to the mix of possible resource options, the reaction from traditional infrastructure managers is mixed. On the one hand, the idea of an elastic, pay-as-you-go compute model is very attractive, but on the other, the pain of migration, perceived data risks, and uncertain economics may eliminate the public cloud option altogether.
The other challenge with the mainstream cloud deployment model for processing intensive applications is that all of the software tools typical in an HPC environment have to be built and tested all over again in the cloud environment. Add to this issues and costs associated with moving large data sets in and out of that cloud and it is easy to see why cloud elation can quickly fade.
Theses realities should not deter organizations from evaluating options. As time progresses, it is becoming increasingly clear that even the largest of organizations will have trouble swallowing the costs of building and scaling their own dedicated computing environments and keeping pace with changing hardware and software technologies. Ironically, there is a reciprocal challenge for younger companies, too, that have come of age using pure cloud resources for infrastructure. These organizations have learned that while public clouds have their place, as their processing environments have grown in scale, so has their cloud bill. They’ve reached a point where the economics start to tip in favor of deploying dedicated resources for portions of their environment.
To address these unique “hybrid” requirements in medium to large scale processing environments, Nimbix offers a blend of do-it-yourself and cloud. Nimbix enables users to scale both their own computing assets in Nimbix datacenters right alongside managed “private” clouds and the Nimbix Accelerated Compute Cloud. This flexibility helps solve for some of the technology and economics challenges mentioned above. By hybridizing in Nimbix HPC datacenters, infrastructure managers can operate traditional in-house clusters while taking comfort in knowing there is more floor space and compute scale available from an infrastructure provider that understands HPC and processing intensive application environments.
Additionally, the ability to leverage Nimbix resources that are deployed as a scalable batch processing cluster simplifies many of the challenges associated with turning up a processing cluster in a commodity cloud utility service. Users can integrate workflows between collocated, managed, and public cloud resources with a simple API call while only paying for actual processing time billed in minutes on the cloud side.
Web service enabled infrastructures will continue to evolve over time enabling smoother migration between public and private clouds. Thinking about hybrid environments might just be the best solution in terms of flexibility, productivity and economics.
Keeping with the theme from my last posting I wanted to share a little information about another one of our amazing customers. HUGEdata has been a Nimbix customer since September of 2012. They provide a cloud-based or onsite database specially designed to handle the demanding workloads from the ever-increasing need to query, analyze, and derive value from large amounts of data in seconds.
It is specifically aimed at companies with the need to analyze huge (hundreds of millions or billions of rows) amounts of data along with complex queries in seconds instead of minutes and hours. The system utilizes existing tools and languages (SQL) already in use at most companies, speeding implementation and reducing disruption to existing data and systems, and all at a lower total cost of ownership.
Learn more about HUGEdata and why they choose Nimbix by clicking hereto read the case study.
As we prepare for a new year in the world of HPC cloud, I find it is always good to spend some time reflecting on progress from the prior year and the implications for the next.
I think many who have spent time using or experimenting with HPC applications or workflows on public cloud resources would agree that steady progress has been made in 2012. While many challenges remain around data movement, security, software licensing and ease of use, we’ve all learned more about what it takes to be successful getting processing work done in the more efficient ways. For this post, I summarize a few of my own observations from 2012 and then make some predictions for 2013.
To keep things simple, I’ll just list them out with some commentary:
Observations for HPC Cloud in 2012:
Early large scale HPC cloud deployments with open source software applications – Open source software is still dominating cloud-use cases, although many commercial software organizations will deploy more formal cloud strategies in 2013 (see below).
Data challenges – There are really two big issues associated with HPC data and public clouds. One is the inherent challenge of transferring and storing (even if temporary) large data sets and the other is data security. It’s no surprise that the early trail blazers in HPC cloud use cases are in segments where data sets are public or have less restrictive security requirements.
Cloud costs still lack commercial-grade clarity – What I mean here is that most users still don’t have a clear picture of how much their cloud-utility bill will be on a monthly basis. There are a few cloud expense management platforms emerging, but the picture is still fuzzy for enterprise HPC computing.
Cloud standards maturing – While standards are still shaping the cloud infrastructure industry as a whole, much of the standards debate is centered around the machine stack and provisioning versus workloads and applications. I expect we will see more drift into applications in the future.
Predictions for HPC Cloud in 2013:
Users and Cloud Providers will add more network bandwidth and data transport acceleration (such as Aspera) to reduce the time to move large data between compute resources
There will be increased use and deployment of data encryption technology which will continue to reduce barriers to cloud adoption
Cloud provider offerings will center more around workloads, applications and processing pipelines versus pure infrastructure
Mid-size and large organizations will migrate toward hybrid private/public infrastructure to optimize economics and monthly spend
Leading HPC ISVs will provide more options and licensing flexibility for cloud enablement
Accelerated platforms and larger memory machines will continue to gain traction in public clouds
We will begin to see more sophisticated tools for cloud processing and workflow automation
So while there is probably nothing earth shattering in the above observations I think it’s important to understand the themes that emerge. Those themes help shape our collective focus for solving problems in the next year and years to follow. They help us discern the best standards and cloud deployment models, and finally those observations and themes can help make smarter business decisions.
I have joined Nimbix to head up our sales and customer relationship management efforts. One of the best things about my position here at Nimbix is seeing some of the amazing things our customers are doing.
With that in mind, I wanted to share a recent case study in which miGenius selected Nimbix to provide the infrastructure for their new release Bloom Unit. Bloom Unit is a plug in for SketchUp that lets users create photo-realistic scenes using the power of cloud computing. With push-button interactive results based on true simulations of how light actually behaves, users can share live views of their design with anyone who has a connected device, perform collaborative changes, and make lightning-fast decisions.
Today, we’re pleased to announce the mid-November release of NACC v2.1. We’ve been hard at work over the last many months to bring you new and exciting features. While there are a number of new features as part of this release, I wanted to bring a few to your attention that I’m most excited about!
There’s no disputing that HPC is hard… we addressed this head on during our first release, by abstracting the complexities of building out your computing environment, and allow you to focus on your dataset and computational tasks.
With our newest release we bring an even better user experience with a new responsive UI. Once you see our new responsive UI, you’ll quickly notice some familiarity. We’re bringing you the world’s first HPC application store. With ease, you’ll now be able to choose your application, choose your tasks and fire your job off, in the fewest clicks possible. Our responsive UI will also enable you to interact with our web service from ANY device, and respond accordingly, whether you’re on a desktop, tablet, or phone. Just think you can now start and monitor your HPC jobs from your Android tablet or iPhone. Increased mobility and fewer clicks equals greater productivity!
Who said notifications aren’t cool? We understand having the right information at the right time and place is critical. We’ve worked hard to provide you with timely notifications, where you choose what you want to see and where you want to see it. NACC v2.1 brings you the best and most flexible job notification system for HPC. Our notification engine has been enhanced to support multiple messaging end-points based on a number different event types. Our newly supported messaging end-points include SMS, and multiple email accounts. Imagine the ability to only SMS your phone when an HPC task completes from your cloud provider, or even automatically notify other team members where the job’s output resides once complete. Or maybe you want your verbose updates going to your email client and start and stop events to your phone. NACC v2.1 brings a truly amazing level of control.
Speaking of job start / completion and input / output results, wouldn’t it be cool to have those inputs / outputs seamlessly moved to the end-point of your choosing? NACC v2.1 now supports the ability to source your data inputs from a number of alternative cloud locations, i.e., S3, Dropbox, any SFTP server, Globus, and your own Nimbix Drop location. This is useful if your data lives on S3, we can consume from S3 and once a job is done… write the results back to your choice of location, such as S3 again or maybe Dropbox, or your own SFTP server. Automated data movement will make your life easier, and take the “wait” out of cloud HPC. Launch your job and we’ll handle the rest.
Automation is a great thing so don’t forget all of the above features and functionality can be leveraged into using our simple, yet powerful API.
We hope you’re as excited as we are about NACC v2.1, mid-November can’t come fast enough for us. Join us in November, we’re changing where HPC is headed.
Hundreds of organizations around the world are working to align and map raw sequence data and many have turned to the cloud to augment computing capacity for analysis pipelines. While there are a number of commercial alignment and mapping software applications to help with the challenge, one of the popular open source options is BWA.
When people think of running BWA in the cloud, most think about Amazon, Rackspace, or other commodity cloud infrastructure providers on which to provision virtual machines billed by the hour. This is certainly an option for on-demand compute capacity, but it can be slow and time consuming to provision for the first time. But what if you simply wanted a cloud-based BWA pipeline ready to run your sequence data as fast as possible?
At Nimbix, the cloud is all about the workload and not the machines. Below is only one example, but running high speed BWA for paired-end sequence data is as simple as making the below API call to the Nimbix Accelerated Compute Cloud:
For human reference alignments, simply replace the data in italics with your data and post to the Nimbix cloud. Your pipeline is automatically run and your SAM/BAM files generated. Since Nimbix operates optimized machines for its bioinformatics processing tasks, users can generally expect results 5 to 15 times faster than any other cloud solution. Different reference genomes can be specified in the API call for other available references.
For more information on making the above API call using curl, wfetch, perl or python, have a look at Josh Devinney’s blog post, Programmatic Job Posting to NACC. If you need an account to try out the above, you can sign up on the Nimbix portal.
When evaluating options for cloud based clusters for use in HPC applications, costs are often a major consideration. For the occasional HPC processing task, preparing a cluster from the instance up (not always a trivial task) can be a cost effective way to solve those compute problems. But what if the HPC processing task is more than occasional? What if it is part of your ongoing business process? At what point does it make sense to consider deployment alternatives?
To take a more quantitative view let’s start by looking at inputs and cost components of a deployment:
Average walltime for HPC Job on fixed cluster size
Jobs required per month
Software licensing costs (if applicable)
Machine cost (purchased)
Depreciation
Power/Cooling/Space
Staffing
The costs may vary from organization to organization depending on datacenter location, cost of electricity, type of cooling deployed, number of staff to support, etc., but in any deployment scenario, understanding these inputs and factors are important.
From the cloud perspective, this can be fairly straightforward, since all costs are abstracted to an hourly or monthly rate. Let’s take a theoretical example of an application that runs on a 12 node cluster requiring 16 CPU cores per node and 3-4GB RAM per core. Let’s assume that the application has an average run time of 5 hours.
The simplest cost to calculate is a single run using on-demand cloud resources. Let’s assume that the hourly rate for a compute instance with the above attributes (excluding data transfer and any cluster creation setup costs) is $2.20/hour. This means the total hourly cost for the cluster is $2.20/hr x 12 nodes = $26.40/hr. A single job run would cost $132.00. Keeping the analysis simple, if a user only needed to run 1 job per month, using an on-demand cluster is likely the way to go. But what if s/he needed to run more than one job per month, or actually install a workload manager/ job scheduler and enable multi-user job submissions? What do costs look like if some jobs fail?
Considering the other extreme, let’s suppose the cluster was needed for a month. The total cost to operate the on-demand cluster becomes $26.40/hr x 720 hours = $19,000 per month…. a pretty expensive endeavor.
Turning to dedicated HPC clouds for a moment, let’s assume that to rent the same type of cluster on a monthly basis was $7200.00 per month. In the above scenario, the break even point between the two deployment approaches is at 272 hours of cluster usage. If the HPC processing tasks requires more than this, dedicated is the way to go.
While the above example is simplistic, it does highlight a quantitative approach to selecting cost-optimized HPC cloud deployment models. Other factors can weigh in; factors like software license management, user location, cluster management support, data storage, node-attributes, interconnect, security, and walltime variance between virtualized and bare-metal clusters. Ultimately, these factors must be reviewed by the consumer and the best, most efficient path selected.
One of the challenges in cloud computing for HPC tasks is creating, deploying, and managing applications as well as data movement for large scale computations. Using our API, you can integrate our cloud HPC processing services into your web applications or workflows.
Once your JSON API call has been created with the automated job builder, you can simply paste the resulting JSON into one of the following code snippets, replacing the {JSON} string.