Running NVIDIA Containers on the Nimbix Cloud

On this page you will find directions for pulling containers from the NVIDIA GPU Cloud Container Registry into the Nimbix Cloud so that you may run them unmodified on bare metal GPUs. Nimbix has developed this workflow to alleviate the necessity of operating with and/or modifying containers specifically for x86 platforms. All that will need to be done is to identify the NVIDIA container required, inform our Docker Registry appliance, wait for the container to be pulled, and then begin to compute.

This document is constructed in two parts, the first part is the step-by-step instructions for how to pull a container from the NVIDIA GPU Cloud container registry into the Nimbix Cloud and begin computing. Part 1a deals with instructions for registering with NVIDIA and generating your API key. Part 1b deals with connecting your identified container with Nimbix and creating your application. The second part, Part 2, contains some notes regarding the operation of the Nimbix cloud and some of the Nimbix cloud’s caveats and conventions. These conventions and caveats may only be three bullet points, but they are critically important for your success in using the Nimbix Cloud. Failure to adhere to these conventions and caveats may result in lost data and delays.

This tutorial has been constructed, tested, and validated by Nimbix Engineering. Should you experience any difficulties or have questions please contact us.

Part 1a – Instructions for pulling a container from the NVIDIA GPU Cloud container registry

1 – Register on the NVIDIA GPU Cloud container registry, click Sign Up if not already registered and go through the sign-up process.

NVIDIA GPU Cloud Registration

 

2 – Select the desired repository from the Repositories menu, in this example, we’ve selected Tensorflow.

NVIDIA GPU Registry

3 – Immediately to the right of the Repositories menu you will see Nvidia/tensorflow. From this box copy the container address (see circled portion)

NVIDIA GPU repositories

4 – Generate an API key by clicking the “Get API Key” button.

Get API key

5 – Find the API key at the bottom of the page.

api keys

 

Part 1b – Logging on to JARVICE and using PushToCompute™

6 – Log into JARVICE and select PushToCompute™

JARVICE HPC Cloud platform

 

7 – Once in PushToCompute, in the Docker Registry Login box, set the Server to https://nvcr.io, enter $oauthtoken as your username and your API key as your password from the NVIDIA GPU Cloud. Once successful, you will be able to pull containers from the NVIDIA GPU Cloud into your App to be run on the Nimbix Cloud.

Enter server name

 

8 – At this point, select from the All Apps the “New” icon.

select an app

9 – You will then need to fill out the form you are presented. This is where the container address we saw in step three is deployed. Leave the Git source URL blank and leave the System Architecture set to the default “Intel x86 64-bit (x86-64) setting.” The team visible box enables your constructed application to be viewed and interacted with by your team members if you have invited any or are part of a larger team within the Nimbix platform. Then click “OK” to build your App.

Create application

10 – After you click “OK” you will see in the All Apps section a new card.

all applications

11 – Now click the hamburger menu (top left corner) in the app card and select “Pull.” By selecting pull you will now pull the container from the NVIDIA registry. Then, select “History” to watch the pull process, this may take several minutes.

pull the container from the NVIDIA registry

12 – By clicking on the new card you will launch your application, in this case, Tensorflow with the appropriate selections for “Batch”, “Server” and “GUI”.  At this point, you are ready to select your mode of operation and associated hardware.

Tensorflow application

13 – Selecting “Server” takes you to a screen where you will configure the machine(s). First, select your “Machine Type” from the dropdown menu and then select the number of cores you wish to run upon using the slider. All different types of machines will be available for your selection including a wide variety of machines containing GPUs (K40s through V100s). This allows the application to run headless and shuts down the instance upon completion. Please remember that data or results that need to be a persistent need to be saved in your /data directory as all other directories are ephemeral.

Cloud server

14 – Upon selecting “Submit” you will launch an interactive session with your configured server. You will be given an address to your server and a password. 

configure cloud server

15 – Once your ssh is successful simply copy and paste the supplied password into the password challenge and begin computing. 

Cloud SSL

 

16 – When exiting, exit from your terminal window with exit command, then click upon the on/off toggle to stop your server(s) running.

exit cloud app

 

Part 2 – Notes, conventions, and caveats

  1.  As long as a session is running within the dashboard you are incurring charges. Be sure to shut down sessions that are not being used. Logging out of JARVICE DOES NOT shut down sessions.
  2. If you need an earlier version of an NVIDIA container, use the appropriate tag as shown at the bottom of their container pages.
  3. Use /data for persistent data, material stored in other areas will be lost upon shutdown.