Share this:

Hundreds of organizations around the world are working to align and map raw sequence data and many have turned to the cloud to augment computing capacity for analysis pipelines.  While there are a number of commercial alignment and mapping software applications to help with the challenge, one of the popular open source options is BWA.

When people think of running BWA in the cloud, most think about Amazon, Rackspace, or other commodity cloud infrastructure providers on which to provision virtual machines billed by the hour.  This is certainly an option for on-demand compute capacity, but it can be slow and time consuming to provision for the first time.  But what if you simply wanted a cloud-based BWA pipeline ready to run your sequence data as fast as possible?

At Nimbix, the cloud is all about the workload and not the machines.   Below is only one example, but running high speed BWA for paired-end sequence data is as simple as making the below API call to the Nimbix Accelerated Compute Cloud:

{ "api-version" : "2.0", "customer" : { "username" : "nimbixusername", "api-key" : "************************************" },"application" : { "name" : "bwa", "command" : "paired-end", "parameters" : { "dbfile" : "input1-file1", "inputseqfile1" : "input2-file1", "inputseqfile2" : "input2-file2", "sub-commands" : { "aln" : {}, "sampe" : {} } } }, "files" : [ { "files" : { "input1-file1" : "human" }, "method" : "nimbixfiles" }, { "files" : { "input2-file1" : "MyIlluminaData_1.fastq.gz", "input2-file2" : "MyIlluminaData_2.fastq.gz" }, "method" : "sftp", "address" : "mysftpserver.location", "username" : "mysftpusername", "password" : "*****************************" } ] }

For human reference alignments, simply replace the data in italics with your data and post to the Nimbix cloud.  Your pipeline is automatically run and your SAM/BAM files generated.   Since Nimbix operates optimized machines for its bioinformatics processing tasks, users can generally expect results 5 to 15 times faster than any other cloud solution. Different reference genomes can be specified in the API call for other available references.

For more information on making the above API call using curl, wfetch, perl or python, have a look at Josh Devinney’s blog post, Programmatic Job Posting to NACC.  If you need an account to try out the above, you can sign up on the Nimbix portal.