Last month, we blogged about using JARVICE to provision supercomputing clusters to run on-demand Hadoop stacks. This week, we will discuss another use case derived from the ability to snapshot HPC clusters: NGS Sequence Analysis with PacBio.
As in many other disciplines, genomics researchers and bioinformatics professionals have a growing processing challenge. Next generation sequencers are producing more data and the analyses require ever more compute capacity.
A Use Case with Pacific Biosciences
In the case of Pacific Biosciences (www.pacificbiosciences.com), machines are delivering higher throughput, higher accuracy and even longer read lengths. To address the processing challenge of secondary analysis pipelines, PacBio offers a software suite called SMRT Analysis designed to run on HPC clusters. The software enables real-time analysis and using the SMRT Portal tool, users can align reads to a reference or assemble reads into a de novo sequence.
PacBio SMRT Portal
The open source software suite can be downloaded from GitHub to run on in-house HPC clusters or, with a simple API call, launched using JARVICE to run the high performance pipelines in the Nimbix cloud.
JARVICE Advantages for PacBio
Running SMRT Analysis in JARVICE provides some unique advantages relative to other cloud solutions. First, there is no virtualization overhead, so the performance and computing throughput will mirror what you would expect to see on a bare-metal in-house cluster. Second, JARVICE dramatically simplifies provisioning and access. With other cloud solutions, users must spin up several instances (compute nodes) and configure them to work as a cluster prior to running SMRT Analysis jobs. Using JARVICE, the entire SMRT Analysis cluster is provisioned immediately. The number of compute nodes is specified in the API call and the entire application stack, including job management, comes ready to receive work.
A third advantage of JARVICE is reducing processing costs. Sequence data can be uploaded for free prior to any compute provisioning. Once data has been transferred to the Nimbix cloud, the SMRT Analysis cluster can be launched. Because the cluster is provisioned like a “read-to-run” appliance, work can begin immediately.
The Nimbix Accelerated Compute Cloud can of course be used to run other bioinformatics applications and pipelines. For the last 3 years, Nimbix has been helping organizations accelerate sequence alignment and search with specific applications like BWA and BLAST. Now, with JARVICE, scientists and analysts can literally implement any custom pipeline and save it for later use. Nimbix Application Environment (NAE) clusters can be provisioned and saved with built in workload management to run entire software suites such as SMRT Analysis to speed up processing, lower costs, and simplify deployment.