Welcome to the JARVICE Application Developer Guide, the essential source of information for creating scalable on-demand workflows on the JARVICE platform.
JARVICE Platform Overview
JARVICE is a platform for delivering high performance computing workflows as a service to end users. A workflow consists of an application, a “command”, one or more data set(s), and the desired resources (type and number of compute nodes to run on). JARVICE automatically orchestrates application runtime, connects the data set(s), and collects the results. Applications are stored in images, which are then executed in containers known as NAEs, or Nimbix Application Environments. These images consist of Linux applications and their respective dependencies, as well as the underlying operating system packages for the specific specific distribution flavor (e.g. Ubuntu, CentOS, etc.). JARVICE provides templates that developers can use to base their applications on, as well as a rich set of automated mechanisms for running at scale once built. Image management allows developers to stage changes and test them before deploying to end users.
JARVICE accounts for workflows as jobs, which are attributed to end users or teams of end users. Itemized and summary billing reports are available to report usage by user, team, or application.
Finally, JARVICE provides data management in the form of JARVICE Vaults, which provide an abstraction layer from the logical and physical topology of the storage in order to present POSIX filesystems to applications. JARVICE automatically orchestrates these vaults when running workflows, ensuring that input and output data set(s) are available and stored in the appropriate locations. Users have access to the data in their vaults outside of application workflows as well.
Glossary of Terms
Definitions for JARVICE-related terminology.
a collection of 2 or more users, associated by payer (team leader)
the “team leader” of a team – all jobs run by team members are accounted for and billed to this user; single users not part of a team are payers by default
a JARVICE user who has access rights to the Native API – that is, they can run NAEs; in the portal, the Build and Visualize tabs, which are front-ends to the Native API, are only enabled for users with developer access
meta data describing a user storage construct; all user data is presented to application environments as POSIX-compatible file system(s), regardless of underlying logical or physical topology
A storage unit that holds files and can be accessed over the network as a file server (e.g. NFS, CIFS, etc.)
Block Volume Vault
A storage unit that behaves like a LUN in a virtual SAN; JARVICE formats a filesystem on this unit so that files can be stored on it, but it can only be attached to one job at a time
Block Array Vault
A storage unit that behaves like an array of identically sized LUNs that attach 1:1 to each compute node in a job. The most common use case is for applications that specifically leverage distributed or clustered filesystems, such as HDFS or GlusterFS. JARVICE automatically provisions LUNs as compute node scale increases, and it automatically formats a POSIX filesystem on each LUN; higher level FS’s like HDFS can be deployed on top of that
A storage unit that maps to a “bucket” of objects on an object store, such as an S3-compatible subsystem. JARVICE formats POSIX-compliant “backing store” for these objects, copies them into this backing store for execution, then uploads the results back to the object store when completed. Use of backing store ensures application compatibility (since it’s POSIX) as well as performance (since object storage is primarily used for archival purposes and files can only be accessed as complete units/objects)
The storage containing an operating system and application to run on the JARVICE platform; this is a disk image containing a Linux+application implementation
(short for Nimbix Application Environment, or NAE) – the runtime instance of an image; this may span several compute nodes in the case of parallel or distributed environments.
A combination of metadata (AppDef), as well as an Image for a given end user application
(Short for Application Definition)- the metadata describing an application and its parameters. The portal renders AppDefs as web forms to collect inputs and parameters to then hand to the actual application. The Application API validates submissions against the AppDef to make sure supported parameters are being used.
In a parallel environment (NAE) set, refers to the first environment in the set; this environment actually executes commands and is responsible for setting up distributed control or parallel execution of the application on the Slaves. A single environment (only 1 compute node requested) is by definition also a Master.
In a parallel environment (NAE), refers to environments 2-n in the set; these environments are controlled by the Master in an application-defined way. JARVICE does not run commands on them directly – they typically listen for inbound commands via SSH. Parallel environments can communicate over whatever protocols are available on the underlying compute, generally including IP (over Ethernet) and RDMA (over Infiniband).
JARVICE supports various types of applications and workflows. In all cases, applications may utilize multiple compute nodes for either parallel or distributed execution.
All applications types can be parameterized using AppDefs, whether the workflow is to begin automatically or wait for user interaction.
A batch application performs finite data processing with all inputs provided as part of invocation. End users cannot connect to batch applications while they are running in any way, but they can monitor output from the portal or API while they run. Batch applications may also optionally support “actions” to execute application-defined commands within the runtime environment. This can be used to control or make adjustments to workflows while they are in progress, in a highly controlled way.
Batch applications do not typically have public IP address, so the runtime environment is not directly accessible from remote mechanisms other than the above mentioned output monitoring and actions.
An interactive application provides an application-defined user interface (e.g. GUI or web page) for end users to interact with it. Interactive applications typically do not run workflows until the connects to them.
JARVICE assigns public IP addresses to interactive applications. Users are provided with this IP address on request, and can also access a specific URL for convenience. This URL may point to an HTML5 representation of application’s GUI, or an application-defined URL, such as a web console, running inside the environment. Interactive applications define metadata inside images to tell JARVICE what connection details to present to the end user.
A hybrid application is basically an interactive batch workflow. The application should begin processing automatically at invocation, but allow the user to connect interactively to monitor or control flow. Hybrid applications are structurally identical to interactive applications (see above), except that they begin processing immediately instead of waiting for users to connect.
Working in the Nimbix Application Environment
The Nimbix Application Environment (NAE, or “environment” for short), is the runtime instance of an application image running as a container or group of containers (known as a parallel set). The NAE runs all userspace software for a given flavor of Linux, dependent libraries and packages, and application binaries themselves. It is similar to a virtual machine except that it runs securely on bare metal (for optimal performance), and does not allow changes to the kernel. Device driver installation and management is handled automatically outside of the container space on the host kernel. Depending on the underlying machine capabilities, access to accelerators and coprocessors (such as GPUs, FPGAs, etc.) is automatically enabled.
The nimbix User
The user nimbix is automatically created in NAEs and should not be modified. JARVICE logs all users in as nimbix when executing code. It’s also a best practice to perform all development activities as the user nimbix. For convenience, the nimbix user has password-less sudo access by default. While not strictly a requirement for all applications, this should be considered a best practice and left enabled.
Directories inside the NAE are structured much like on ordinary Linux systems of the respective distribution. Additionally:
This is the root directory of the environment. It maps to the image itself, typically sized at 100GB maximum. The platform administrator can increase the size of images if needed.
This maps to a temporary/ephemeral disk image that is automatically discarded when the environment exits, even when the environment is persistent (e.g. in staging mode). Applications and developers should never store anything other than temporary files in this directory. The size of this image varies by deployment and machine type, but is generally in the 100GB range at least. Applications should not make assumptions about the size of this image – instead, they should query it dynamically if needed, as it’s unspecified.
This directory, owned by user nimbix, is where the user data is mounted from the selected vault at runtime. In the case of File vaults, it is mounted from the same file server on the master and any slave(s) in the job. In the case of Block Volume vaults, it is mapped to the master and exported to any slave(s) in the job. In the case of Block Array vaults, each environment (master and any slave(s) for the job) have their own volume attached in this directory, so the data will differ from machine to machine.
In the case of Object vaults, the selected objects are cached at runtime in “backing store”, which is exported automatically from the master to any slave(s) in the job. Once the master exits, output object(s), if any, are automatically synced back to the object store. If the job is terminated or canceled, output object(s) are not synced back to the object store.
Applications should make no assumption about the size of the /data directory, as this will vary depending on user and underlying storage topology. All calculations around available space should be based on dynamic configuration. Applications should also assume that anything written to /data is persistent – therefore, “scratch” or temporary files should be written to /tmp instead. The only exception is if working files must be visible to end users outside of the environment, or parallel applications must share the same set of working data across nodes. Note that only File vaults will allow users to see working files from /data outside of the environment however. The best practice is to store working files in /tmp, and move data into /data as necessary (e.g. when “checkpointing” or exiting). Application “actions” can be used to control flow, rather than relying on users creating or managing files during workflows (see below).
The Application API overloads the /jarvice/submit method for initiating application-specific workflows.
To generate Application API JSON, please use the JARVICE portal’s Task Builder (by clicking on an application and command from the Compute tab). You can copy the JSON to the clipboard during the confirmation step after clicking Submit.