Building and Deploying GPU Applications in the Cloud

nimbix_bhsA Little Background

Before diving in let's just get a little background out of the way.  migenius is a 3D rendering technology company and our focus is on photorealistic, physically accurate 3D rendering in the Cloud. Our main product, RealityServer is a development platform built with this focus in mind. Incorporating the NVIDIA Iray rendering engine, it iscapable of producing stunning, photorealistic renderings using GPU hardware and the NVIDIA CUDA platform. More than just a renderer, RealityServer binds this capability to an extensive Web Services API which allows this functionality to be easily accessed in the Cloud from any application or service.  migenius additionally builds end-user products and services such as Bloom Unit for SketchUp and works with partners to help them create their own RealityServer based solutions. These tools drive the unique types of compute loads which will be covered in this article. Of course, to enable all of this to be successful, we first need to have GPU resources in the Cloud which our customers can access.

Deploying Our Customers' Solution on GPU Resources at Scale

One of the first things our customers often ask us is “Where can I run RealityServer, I don't have any GPUs”. Previously we would setup a loan machine or give customers access to some of our own resources but as the number of customers increased this became less and less practical. We also then had many customers who after deploying their RealityServer based solution wanted to scale out easily to increase their capacity. We needed a way to run both our own in-house RealityServer based services and a good base we could point our customers at to build out their solutions without having to learn the intricacies of GPU server deployment. RealityServer is a great solution for many different markets however it relies on the availability of GPU hardware to get its best results. This is where Nimbix has really helped us to help our customers when it comes to roll out time.

To give a concrete example, one such customer is Boathouse Sports. Founded in 1985 by two-time Olympic rower John Strotbeck III, Boathouse Sports provides highly customised, locally manufactured team sporting apparel. Working closely with Pete Simon and Gretchen Boehmler from Boathouse's team in Philadelphia as well as their chosen web development agency, migenius has helped to create the 3D rendering system for a new website which will launch shortly. Incorporating photorealistic 3D renderings of their products generated on the fly using GPU servers, custom configured for each user, the site offers customers an accurate presentation of each customers configured products. Pre-computing the product images for this application would be out of the question as the number of potential combinations of product configurations in essentially infinite. Users can upload their own logos, enter their required text and naming and it is all incorporated into the photorealistic imagery shown to the user when configuring their product. This is one of the first systems in the world to deploy truly on-demand photorealistic 3D rendering of mass customised products.

As one can imagine the team sporting apparel market is subject to wide fluctuations in demand and is highly seasonal. To that end Boathouse needed a backend infrastructure for RealityServer that could easily scale out when needed and contract when demands were lower. To make things more complex the application demanded the very latest GPU hardware (NVIDIA Tesla K40) to obtain the performance required and memory capacity to prevent frequently loading and unloading their datasets (the Tesla K40 sports 12GB of GPU memory). Using advances in Iray and CUDA Compute 3.5 capabilities offered on the Tesla K40 enabled us to decrease the startup time for initialising new renderings of products by a factor of 25x, making this level of hardware support a clear requirement.

The list of options was already very short for procuring these resources and once the Tesla K40 requirement emerged it was clear that Nimbix was the only vendor able to offer a viable solution. Additionally, the need to expand and contract capacity without having to procure large blocks of server time (Nimbix offered resource billing down to the minute) was critical for this Boathouse. When combined with planned marketing activities and promotional events that drive traffic to their site, the 3D rendering service can expect to see a significant jump in usage. This needs to be something that can be reacted to quickly and then just as quickly react to the decrease in utilisation. For a live retail site this ability to flexibly allocate resources is critical.

Provisioning of new resources with Nimbix JARVICE takes us well under a minute which makes reacting to demands very straight forward. In the case of Boathouse those demands are strongly correlated to increased sales since users typically only go to the trouble of extensively customising their products if they plan to make a decision to purchase. Offering photorealistic representations of exactly the product the user will get when manufactured and not just a generic version without their customisation, dramatically increases conversion and minimises returns.

This approach can apply to any mass customised product and there is a clear requirement for 3D imagery since it is completely impractical to photograph the millions or even billions of combinations that are possible.

Why GPU Compute is Specialised

Most Cloud providers are typically focused on infrastructure used for web servers, database servers, email servers and other similar types of applications. These services can be very effectively virtualised and multi-tenanted and typically adding a single user to a given service running on these servers has a very small incremental effect. Managers typically look for trends in utilisation and plan around known usage patterns. Usually one server can handle an extremely large number of users and scaling the resource pool and or down, while dynamic is often only occurring slowly over time. These types of compute loads are well suited to traditional virtualisation and provisioning methods. However the compute loads generated by users of GPU accelerated applications (or for that matter any compute heavy application) are very different.

Users of these types of resources have an insatiable thirst for computing power and will take all of any resource you throw at them. Unlike traditional web server, database and other workloads, tasks like photorealistic 3D rendering can never have enough compute power because the user would always like the results back faster. Doubling your server capacity is unlikely to do much for website user number 25,546 however for an architect sitting at his desk waiting for the rendering of his new building to complete, time is everything. This flips the usual assumptions on their head. The incremental cost of adding users becomes much higher since each can effectively utilise an entire server themselves and while GPU servers are the most cost effective means of handling these types of tasks they are still costly pieces of hardware.

These differences in usage require a completely different methodology for provisioning systems and accessing their infrastructure. There are services currently that attempt to deploy GPU resources using the same models as traditional Cloud IaaS (Infrastructure as a Service) offerings, however for many reasons these provide a sub-optimal experience for companies building applications that generate compute heavy workloads.

Deploying our Own Services

migenius develops and maintains a product and service called Bloom Unit [], a Cloud based photorealistic rendering plugin for Trimble SketchUp. Users click a button and their model is uploaded to our servers and begins rendering immediately, the user can then interact with the model while it is rendering and make any changes they like. Each user of Bloom Unit is allocated 2 high end NVIDIA GPU resources which are fully utilised by that user during their session. Building out this business put migenius squarely in the position of needing to find a solution to the resource management problem. While migenius employs a lot of custom code for handling resource management, at some point a physical resource must be provisioned which is where things get really interesting.

Having worked previously with vendors who employ an hourly usage model we had found that this simply did not offer enough granularity to be cost effective. When a resource was provisioned on an hourly basis it would often sit idle after around 10 – 15 minutes of usage or until another user came along to take it up. If a lot of users arrived at the same time (something that should normally be wonderful for a paid service) then many resources would potentially be allocated. After the users were finished, possibly after only a small fraction of an hour we had to find new users for the resources or they would go unused. This clearly was not going to scale and in times that should create the most revenue for the business could actually end up generating hefty costs instead.

At the time we were casting around for a solution to this problem Nimbix were already offering its NACC service for batch based HPC compute tasks. The type of thing that you set off and forget about until its complete then gather the results off the file system. These tasks can be queued up, executed, completed and the results stored away for unattended use but more importantly Nimbix offered granularity down to the minute for billing these tasks. On the surface this seemed like an ideal solution however many aspects of our solution would not easily work in this environment. For one thing our users needed to be able to connect, over the Cloud to the hardware doing the computation and the jobs would need to run for an unbounded amount of time. It was tantalisingly close to a solution but not quite everything we needed.

We had some pretty specific requirements so we were not sure these were things Nimbix could easily support however in short order NACC began to add functionality that pushed it over the line. Using the JARVICE API and the NAE (Nimbix Application Environment) based provisioning we were able to build out a solution that could provision new GPU resources quickly and destroy them just as quickly, only incurring the cost for the real usage and removing the need to find a home for idle resources after they were used. This rapidly implemented functionality has made Nimbix NACC/JARVICE an ideal fit for the workloads we are producing with the Bloom Unit service.

Points of Difference with Nimbix

The use of container technology at Nimbix is one of the things that originally attracted us to their platform. The ability to rapidly provision and start a bare metal resource which can access GPU hardware means that we can spin up and down resources at will without waiting minutes for them to start. In many cases if we were to wait minutes the would-be user on the other end will have been long gone and keeping enough static resources spun up and ready to use is just not practical. Using the JARVICE API we can start an instance along with our software in timeframes of around 10 – 30 seconds, we can even do this while other activities such as uploading data are occurring so the user will not even notice. Additionally, the container based GPU resources typically outperform their virtualised counterparts even those employing PCI passthrough, we get the full benefit of the GPU power.

The billing granularity is also critical for our use of the services and many of our customers applications since the tasks being run, even interactive ones have a very wide range of durations between only a few minutes to many tens of hours. Our platform additionally has the ability to cluster multiple resources together into a single unit for accelerating rendering. When looking to offer this with hourly billing it is impossible to make the business case work. Say for example you offer users clustered rendering and someone comes in and requests 100 GPU resources to get their rendering completed in 1 minute instead of 100 minutes (let's assume linear scalability for the sake of simplicity), then after that user has finished with their minute we have to find either 100 other users for the remaining 59 minutes or another user who wants 100 resources, both of which may be very unlikely. With billing to the minute we can start the 100 GPU resource and throw them away after a minute.

The icing on the cake is the fact that Nimbix have reacted quickly to new hardware availability and already offer Tesla K40 based servers in addition to the Tesla M2090 based resources. From working closely with Nimbix it is clear that they are keeping their systems much more up to date than other providers and offering these resources to their customers very soon after the hardware becomes available. Since our technology leverages many aspects of the latest hardware this can become critically important when customers need a particular product feature that is only available on certain hardware. As a final great addition, Nimbix resources have a shared file system directory that is pre-configured for you and needs no extra setup. This is invaluable when you need to share data between resources and allowed us to deploy very quickly, particularly when nodes access very large datasets.

Just so we don't forget those points here is a quick recap:

–          Massively faster provisioning (typically 10 – 30 seconds)

–          Bare metal performance

–          Granular billing (down to the minute)

–          Latest GPU hardware available

–          Pre-configured common file system

Why GPU?

We are often asked why we don't just use CPU resources instead of GPU resources, there are more of them around and they are easier to come by. Of course, we have asked this question ourselves already and rather than make assumptions we did some testing from a very early stage. The results surprise many people but the simple answer is this, GPU resources are more cost effective. We  benchmarked many Cloud service offerings using both CPU and GPU resources and in every case we found that to get the equivalent performance for our users, CPU resources were between 2-4x the cost of GPU resources. This makes utilising GPU based resources for applications such as 3D rendering extremely attractive, even more so when combined with the benefits discussed in this article. GPUs are not just a fancy piece of technology, we use them because they save us money and in turn save our customers money.

Parting Comments

If you are looking to deploy a product that makes heavy use of GPU resources and are looking to the Cloud then you really need to consider how your chosen resource provider will help you scale as your product grows, not just from a technical point of view. Its really easy to focus on the details of how the API works and the underlying technology stack. Those things are important but will mean nothing if it costs you more to provision your resources than you can charge your customers to use them.

Think carefully about how your services usage patterns will map to resource allocation and the costs associated with it. If your provider isn't offering fine enough granularity you may end up stuck with provisioned resources sitting idle while you are still paying for them.

You should also plan time to benchmark the providers you are considering, ideally with real-world loads you have captured. You can't suitably evaluate the price/performance of your options if you are not accurately quantifying performance. Get a baseline established and when new hardware and offering become available re-run your tests as soon as you can to find out if you should be switching. If you have algorithms that run on CPU and GPU benchmark them both and see if GPUs are giving you what you need. For us this made the decision to deploy GPU resources a no-brainer since they were going to cost less. It's also a good idea to share your results with your provider since there may be areas they can tune in their infrastructure to help your performance, Nimbix in particular have helped us enormously in this respect.

With Nimbix JARVICE you can fire up a resource right now and start checking all of this out for yourself, run your tests and start planning how to leverage GPU compute power.

Who Am I?

Paul Arden is CEO at migenius, a private company headquartered in Melbourne, Australia with offices in Japan and Europe and customers in over 25 countries. Previously the Product Manger for RealityServer while at mental images (since acquired by NVIDIA), Paul has extensive background in photorealistic rendering, physically accurate rendering and Cloud based solutions encompassing these. RealityServer is available now on Nimbix JARVICE and the Bloom Unit plugin for SketchUp currently utilises Nimbix resources for servicing its users.