Over the years NVIDIA has done a great job at making its CUDA platform accessible to developers. It provides free toolkits and enables GPU supercomputing on affordable (consumer) GPUs such as the GTX series. NVIDIA’s bet is that the more available its tools, the more applications will leverage its platforms (and the more GPUs it will sell). This is not unlike what Microsoft did so successfully in the 1980’s and 1990’s, resulting in the richest application ecosystem of all time. The same is happening in the supercomputing space. In late 2015 accelerators delivered a full 1/3 of all FLOPS in the top 500 list, with NVIDIA GPUs powering 23 of the 24 “new” accelerated supercomputers. Hundreds of CUDA applications demand GPU-accelerated computing platforms, and as a result developers have downloaded the CUDA toolkit millions of times to date.
Cloud computing plays an important role in delivering the next generation of accelerated platforms and architectures – in fact, this is what the Nimbix Cloud, powered by JARVICE, is all about. Accelerated computing is embedded in our DNA, and we’ve enabled supercomputing GPU (and FPGA)-based workflows and tools from the very beginning. Developers have access to a variety of GPU-based environments, including K80 and TITAN X machines. The JARVICE API automates the bare-metal provisioning and workflow deployment on these systems, delivering the fastest performance available anywhere in the cloud. Example use cases include machine learning (e.g. “deep learning“), photo-realistic rendering, life sciences, and other compute-intensive problems.
#CUDA in the #Cloud: the New Development Standard via @Nimbix http://bit.ly/nimbixCUDA
Until recently developers had the option of developing their applications on high end GPU machines, and then testing them at scale as parallel or distributed workflows. The granularity of these systems was designed for high density computing, which is the prevalent production configuration for CUDA applications in the cloud. For example, training a neural network requires lots of machines with lots of GPUs, so it doesn’t make sense to string together low end configurations when high density systems provide a much better price/performance ratio at scale. Besides, developers, thanks to NVIDIA’s widespread CUDA support, always had the option of creating algorithms on low end computers and then deploying them at scale in the cloud. It’s true that the system on your lap (or your desk) simply cannot keep up with a supercomputer, but developing and unit testing algorithms is not about performance. As long as the APIs match the production target, the software simply doesn’t care whether results come back in 5 seconds or 50 seconds. This is why NVIDIA supports the same versions of CUDA on its consumer-grade GTX series as it does on its Kepler HPC cards.
Democratizing CUDA Development in the Cloud
What would happen if CUDA development in the cloud suddenly became much less expensive than building your own low end GPU system? And… what if even the development environment could deliver up to 2x the CUDA performance of said home grown rigs?
Last week we announced a technology partnership with Bitfusion, a company in Austin, TX that develops a product called Boost. This technology has multiple use cases, but the initial one enables developers to securely share TITAN X GPUs at tremendously affordable rates. What’s more, this sharing is completely transparent and Nimbix enables Boost as an optional feature of the JARVICE platform. This means code need not be “Boost aware” – any Linux-based CUDA code (that dynamically links to the CUDA runtime libraries) works either with Boost or without it, taking advantage of cost effective development GPUs or high end supercomputing ones, respectively. Developers need not license Bitfusion’s code separately, as part of the partnership allows us to deliver it to you on demand, with nothing to install or maintain. And, unlike with other clouds, you don’t need to manage (and pay for) both a Boost server and client separately, since the server capability is (optionally) “built-in”.
One low hourly price gives you access to JARVICE and the Boost technology as a feature of the platform. How low? We start at less than $0.50 (fifty cents) per hour. Let’s put that into perspective. If you operate a cloud-based environment for 8 hours per day, 5 days per week, 50 weeks per year (yes, that’s assuming weekends off and 2 weeks of vacation!), that’s less than $1000 per year (or about $83 per month). Building a dual TITAN X system on your own would easily cost you more than double that – the GPUs themselves cost about $1100 each (assuming you can get your hands on one due to limited supply), and you need appropriate power, cooling, storage, an i5 processor, and RAM. The ng0 machine type (available right now on JARVICE) gives you 2 bare-metal Intel Haswell cores, 16 GB of RAM, and 1TB of free cloud storage in addition to CUDA on 2 TITAN X GPUs. When you’re ready to run the code at scale on say, 4 K80s (dual ngd5 machines), you can do that with a point and click web interface or an API call, and you only get charged for the time you use on those higher end systems. You can run the same exact images and code without modifications. Once you validate the results, you can go back to using the more cost effective ng0 machine. We even maintain the latest version of CUDA for you (7.5 at the time of this writing), and continue to deliver the best selection of the latest NVIDIA GPUs. This means that you don’t need to upgrade your home grown system next year when NVIDIA has something better, because your cloud computing partner will simply deliver it automatically. We think the price alone speaks for itself, and when you count the convenience and extra capability of cloud computing with a platform like JARVICE, it becomes a no-brainer.
The Future of CUDA (and OpenCL) in the Cloud
Where do we go from here? The typical high end Nimbix GPU machine configuration features dual NVIDIA cards, such as the K80. This effectively gives you access to 4 GPUs (since each K80 is really 2 K40’s). You can access more by making your algorithms distributed, and using the JARVICE platform to launch a cluster on demand. This is seamless and straightforward and has powered thousands of high-end workflows over the past few years. But what if the CPU portion of your algorithm is either very lightweight or can’t easily be parallelized? In the coming months, we will be launching the “scale-out” version of the JARVICE/Boost integration, which will allow you to attach say, 8 (or more) K80s to a dual core/16GB RAM CPU machine. Best of all, your code will “just work”, because CUDA will still be CUDA. What if OpenCL whets your appetite instead? We’ve got that too.
We’re proud to say can now let go of all preconceived notions about GPU application development in the cloud, as we’ve just reset all assumptions. We invite you to tell us what GPU configurations would make the difference for you, and can’t wait to see what amazing computing challenges you’ll solve as a result.