Nimbix BlogSuper musing about all things supercomputing
Accelerated computing affords us many luxuries, faster computing, computes on a larger data set, but one aspect that is often not mentioned is the luxury of exploring multiple methods for solving a particular problem. Luxury? Hold on a minute, exploring various methods for solving a particular challenge is a necessity. If the reader indulges me in a little bit of abstraction, we find that for every problem there is a set of possible solutions. This solution space is often referred to as the event space, and this space has some interesting features and topography. I promise that’s as far as I’m going with this stuff, you can exhale now. What this means is that for a given problem, there is potentially more than one solution and for the sake of correctness, should we not evaluate either all possible solutions or at least a selection of methods to generate solutions?
This is where accelerated computing makes strong sense. If an accelerated machine can run a model in half of the time a non-accelerated machine can run the model, we now have a conundrum. Are we satisfied with the result of one model or, is it more economical, in terms of time, to run two different models in the time it would usually take to run a single model? In essence, it is the balance between volume and speed. Would I rather have two answers to evaluate or one answer to evaluate in half of the time? Similarly, increasing this would result in many solutions to assess and the need to review all of these solutions to find the best one for our application.
One way to look at this is to assume that all models are fundamentally incorrect (after all, they are approximations, even in the simplest of systems) and running multiple models allows us to examine different solution methods that have been optimized differently or that are computed using various methods. This makes it possible to take a more holistic view of our proposed solutions, and it gives us confidence in our interpretation. Multiple solutions mitigate the possibility that a solution generated may be faulty due to the input conditions. If by chance the input data causes the analysis to go down a particular path which results in incorrect results, then a single instance can be very detrimental. The ability to run multiple cases concurrently and arrive at an array of solutions reduces this risk. The incorrect result will be an outlying data point while the majority of solutions will fall within the norm. The luxury of multiple solutions also creates the necessity to review all of these solution sets and utilize the appropriate one that works best for our application.
The notion of gaining confidence from multiple corroborating sources is often referred to as evidence-based support. This is commonly seen in decision support applications. In the decision support context, a question is postulated, and evidence is collected, and from that evidence, support is either present for the conclusion or it is not. It is the support that is then evaluated concerning validity. Multiple solutions also pose a new problem where the user needs to determine which solution is best for the application. This can quickly become very difficult to decipher particularly since the tolerance grows for the variables and boundary conditions established for the analysis.
In many cases, operating in the decision support context provides more freedom for interpretation than with a single model. In fields that area based primarily on expert interpretation, the decision support model has considerable traction. One such field is the practice of medicine. In particular, anatomic pathology. Pathologists are highly trained specialist physicians who have an extensive specialty (an additional five years minimum post medical school) and are trained to diagnose disease based on laboratory analysis. One of the most common pathological techniques is a histological examination of surgical specimens. If you ever have surgery where something is removed, that bit of removed tissue is sent “to the lab” for analysis. As part of that examination, the lab makes thin sections of the surgical sample and mounts them on microscope slides and stains them with chemicals that highlight different areas in different colors. The pathologist examines the slides and treats the slides, to make a determination as to the nature of the pathology of the surgical sample. The pathologist then writes up a report and sends it to the referring physician, frequently the surgeon. Now, the pathologist is an expert diagnostician and is often the final word on whether that tumor removed from your lung is cancerous or not. If it is cancerous that sets in motion a series of life-changing events, if not, then life continues as usual. Now, given that the pathologist is often the first to detect or confirm cancer, would you want them to make that decision based upon one point of data? Not me! I’d want them to run the full battery of tests and assays and paint the most comprehensive picture possible and support their diagnosis.
Engineering and other consumers of high performance computing are becoming more like pathologists. A practical engineering example of where multiple solutions are reviewed is a dynamic analysis of a car crash. Engineers build a model of the car in question and test the crash using simulation. The slightest change in input will result in significantly different results. If these tests were conducted with only a precise single contact from the front end of the car and it results with passengers not being harmed, then it is irresponsible for the engineer to conclude the that the car is safe during impact base on this single solution. Just slightly changing the impact location by a few inches up or down can have drastic changes downstream in the analysis. Supportive information is becoming more and more critical; therefore, exploring the solution space with multiple models to support a particular conclusion will be made more necessary. Accelerated computing solves this economically in that it can allow a larger area to be explored for any given unit of time. So, I guess multiple solutions isn’t a luxury, after all, they are a necessity.
When we think about Artificial Intelligence, we have a large array of potential models to choose from. We can imagine a rule based engine, a neural network, or some other, more exotic method such as a Generative Adversarial Network, that classifies an input and then executes an action based upon it’s classification. Rarely though, do we ever take a step back and look at the system as a whole and how the system maintains its Viability (you remember Viability from my Eight V’s…blog post). That is the topic of this blog post, system Viability and we’re going to look at it through the lens of The Viable Systems Model.
The Viable Systems Model was proposed and refined by Stafford Beer and others (John von Nemann, Norbert Weiner, W. Ross Ashby, Alan Turing, and many others) between the 1930’s and 1970’s. What these theoreticians were attempting to model were the universal mechanisms for a system, any system, to be self-governing and self-perpetuating – Viable. The field of study Beer and his associates were operating in was called Cybernetics, the science of communication and automated control. You could think of Cybernetics as a contemporary and companion to Systems Science, Control Theory, and all of the other disciplines that seek to describe and understand how things (gene regulatory networks, organisms, groups, corporations, political bodies, societies, etc.) adapt and change over time. Beer was primarily concerned with corporations and economic governance, but, his models and theory is general enough and universal enough to be applied within any discipline including art and sports training.
This all sounds intuitive enough. Almost fifty years ago a group of theoreticians came up with a model of governance and feedback. How does that impact us now? Heck, in the early 1970’s computers were the size of large rooms that had special elevated floors and programs were on punch cards or paper tape. They didn’t have cell phones, or even fuzzy logic rice cookers. Polyester was cool, hair styles were generally regrettable, and the Brady Bunch was in first run.
Unlike Mike Brady’s perm, and penchants for flammable petroleum-based synthetic fabrics, this theoretical system of governance has found a place within machine learning and is the corner stone of the “Eight V’s of Big Data and Artificial Intelligence” in the area of Viability. Now, our task is to very briefly examine and explain the Viable Systems Model (VSM) within the context of Artificial Intelligence and Big Data.
The VSM is a five system or layer model that governs or controls different aspects of an entities existence through time, layers 1 -3 respond to stimuli that influence activities in the “here and now,” layer 4 deals with reconfigurations for future or predictive elements that will influence the entities long term viability and layer 5 which seeks to balance or buffer layers 1-3 against layer 4 (see diagram 1).
Dropping out of this level of abstraction and examining the parts individually through the lens of an organism, we see we can relate to, lions (diagram 2)
Diagram 2 – The Viable System Model, https://www.slideshare.net/issip/an-introduction-to-systems-thinking-for-tackling-wicked-problems-57502299.
System 1 – Is the activity itself that describes the system, a living system, for example metabolizes and respires (burns food, produces waste). Lions do this, right?
System 2 – These are communication systems between within the living body, a nervous system or signally system of some sort. Lions have nervous systems.
System 3 – This is the monitor and control system for System 1, in a living system, this system regulates simple activities like metabolic rate and respiration rate. In humans and mammals, this can be thought of as the autonomic or involuntary nervous system. Depending upon the complexity of the organism, System 3 can also encompass circadian rhythms and other innate behaviors. Lions sleep, area awake, hunt, mate and have other certain behaviors.
System 4 – This is the first set of outwardly looking systems that take in input from the external world or milieu. These systems can be thought of as external sensors, touch, sight, hearing, and so forth. With these sensors are also rules that allow for self-preserving behaviors. For example, System 2 communicates to System 1 that the organism is running low on energy (is hungry). System 4 identifies the communication as hunger and identifies a food source and begins to eat. Lions do this very frequently, we can think of this as typical individual lion behavior.
System 5 – This is the component of the system that governs or balances System 4 activities against Systems 1-3. For example, if we look at pride of lions, we see System 5 activities taking place with feeding priorities, young weened cubs are higher up the feeding ladder (eating with their mothers who did the hunting) than are older cubs who eat last. This is done to assure the next generation of cubs can nutritionally make it to adulthood while maintain the social order of the pride. In this case, System 5 is the lion pride dynamics that govern a group of lions and modulates their behavior. On a systemic level, we can equate System 5 Rosseau’s Social Contract, https://en.wikipedia.org/wiki/The_Social_Contract.
Another way to think of System 5 activities are those activities that allow the organism or entity to co-exist with other entities like it and interact within its milieu.
All of this translates directly to artificial intelligence. If we look back at the concise definition proposed by Accenture in that was put forth in an earlier post, we see that artificial intelligence is defined as the ability to “sense, comprehend, and act”, we see that VSM maps directly. Systems 1-3 sense, System 4 comprehends, and System 5 balances the needs that have been sensed by Systems 1 – 3 with the actions proposed by System 4. If all five of these systems are tuned and trained appropriately, then exists a system that is viable over time and can change and adapt to its environment. This is ideally what we want in an artificial intelligence. It does us very little good to develop an Artificial Intelligence that only works at time point zero or use case zero, that’s like being a lion and not understanding the concept of hunting or eating. If that is true, as a lion, your viability will be very short.
As we build AI’s, we need to keep this abstract model in mind and think about the Viability, the continued Viability of the products that we are creating. There are very few universal truths, one of them is, change is difficult, even for AI’s, and what the VSM does is give the AI a built-in mechanism to introduce self-change in response to the inputs that it is receiving. Models need continual training to remain relevant. When viewed through the lens of the VSM, AI’s become more than just automated decision points but entities that adapt over time to the changing landscape of their niche. This then brings us back to the utility of accelerated computing, in order to make truly viable AI’s there needs to be continual training, and continual monitoring and modeling of the external milieu as well as internal response model. This continual level of self-monitoring requires accelerated computing to maintain viability or the monitoring activities over take the AI’s ability to respond, think of this as a modern day “swap of death” situation. So, save your lions, use accelerated computing to enable your AI’s to be truly Viable Systems.
By Tom McNeill
Artificial intelligence is really nothing new, it is the ability for a machine to “sense, comprehend, and act”, according to an Accenture publication. The real use for AI is in wading through the increasing volumes of data that are being generated on a daily basis and automating responses to signals from that data. Let’s look at one of the first commercial uses for fuzzy logic, a form of AI, the fuzzy logic rice cooker by Zojirushi. You select your type of rice, you put in water, set it and forget it. The rice cooker has sensors that monitor temperature and humidity and adjusts the temperature and cook time accordingly ensuring well-cooked rice. What it is really doing is automating and adjusting the cooking process of a food product that has been being cooked for thousands of years. In short, people know how to cook rice, the Zojirushi product has a model for cooking rice well and implemented an automation for making rice.
So, if we take our rice cooker analogy a step further, we find that the cooker is only capable of making a pot of rice with types of rice, or grains with which it is familiar. Jasmine rice, OK, short grain rice, no problem. What about wild rice which actually isn’t a rice at all but a grass, or a pork chop which certainly isn’t rice? Depending upon how the logic is implemented, all the rice cooker can do is monitor temperature and humidity in the pot, everything to the rice cooker is rice because it’s models don’t know about the edge case of wild rice or true outlier, a pork chop. This is the problem with fixed model systems, they don’t deal well with new or unusual things. That’s where learning comes in.
Learning is the real power and downfall of artificial intelligence. In most cases, AI’s are trained on sets that have been assembled to replicate a truth, a calculable entrance requirement to a labeled set or category. The entrance requirement can be a set of metadata that is weighted to achieve a score which determines the entrance to a particular category. We see this process go humorously wrong with toddlers when they are learning to speak. For example, little Freddy is 10 months old and he calls the family dog, Rover, ‘doggie’. Rover has four legs, a tail, and fur. On a day out with the family, Freddie sees a horse for the first time, points to it and says, “doggie”. Freddy just had a false positive because he had never seen a horse before and defaulted to the label he knew for things with four legs, a tail, and fur. In short, much like toddlers, AI’s are only as accurate as the training (experiences) they have been given. Can an AI train itself? Yes, it’s an old field called cybernetics, and that will be a topic for another blog post and no because there needs to be some sort of seeded a priori knowledge.
OK, rice cookers and Freddie the toddler, what does this have to do with supercomputing and artificial intelligence? These two examples have shown that just like people, artificial intelligences are bound by what they have been taught or trained upon and bound by the topology of their internal classification scheme. Supercomputers, or more specifically put, computers with accelerated hardware, are capable of increasing the speed of the system that governs the artificial intelligence. Due to its increased speed and capacity, it can train faster, on larger, more comprehensive sets, as well as on more focused and deeper training sets. This ability then allows a more fine-grained ability to discriminate input and a greater classification topology (more classes for classification and more complex relationships between classes).
Going back to our definition of artificial intelligence from Accenture, “sense, comprehend, act,” we see that artificial intelligence is just an automated classification…oh, yes, you in the back row, what’s that…inference and prediction? You’ve been reading ahead. Yes, both inference and prediction appear to be forecasting into the future; however, in both cases, they are using the models they have been trained upon, and forecasting methodologies that they have been trained to use, so in fact, they are rearward looking and very similar if not identical to our classification example. Inference simply trims and optimizes the classifiers in response to use, and prediction merely extends the models that are constructed in some logical way. We could even go so far as to say that any inference or prediction is a function of the training given to the AI. So again, we come back to our models and our classification topology.
If we accept the fact that AI’s are topology bound, this means that to get closer to the truth (whatever that is), every set of data can be categorized against numerous different classification topologies, we can call these different topologies “facets”. This is where accelerated computing shines. Instead of attempting to classify against a single entrance requirement, multiple AI’s can be trained against multiple entrance requirements that represent different potential semantic realities. For example, if the requirement is to classify types of ‘blues’, one logical set might be to name colors, (navy blue, sky blue, baby blue, …), a second might be musical genres (Delta blues, Chicago blues, Texas electric blues, …), and a third might have to do with Major League Baseball teams and players (Toronto Blue Jays, Vida Blue, …). The result is that once the facet space has been identified and defined, a more whole or full AI solution can be generated. So, if something as simple as ‘blue’ requires at least three fully trained classifiers, more complex search spaces will require much more. This expansive requirement means one thing, more compute time for training. This is where accelerated computing is a natural fit, faster compute means more facets can be trained per unit time. More facets trained means a more robust and complete AI coverage. With better semantic coverage, you are less likely to have your AI pointing to a “horse” and calling it “doggie.”
If you dive in to the field of Supercomputing and Big Data you will begin to run across blog posts talking about the “V’s” of the field, the six, the eight, the ten, the twelve, and so forth. No, we’re not talking about engines, we’re talking about lists of nouns that name aspects or properties of Big Data or Supercomputing that need to be balanced or optimized. The list of eight balances being complete while remaining concise, the higher numbered lists tend to veer off into data governance issues that are generally not issues we need concern ourselves with at this point.
The eight V’s: Volume, Velocity, Variety, Veracity, Vocabulary, Vagueness, Viability and Value
Most of these are pretty self-explanatory, but let’s go through them just for drill.
Volume: The amount of data needing to be processed at a given time. This can manifest either as amount over time or amount that needs to be processed at one time. For example, doing a matrix operation on a 1 billion by 1 billion matrix or scanning the contents of every published newspaper in a day for key words are both examples of volume that can constrain computing.
Velocity: Similar to Volume, this has to do with the speed of the data coming in and the speed of the transformed data leaving the compute. An example of a high velocity requirement is telemetry that needs to be analyzed in real time for a self-driving car. The enemy of velocity is latency.
Variety: The spice of life, or the bane of computing? In the computing context we are discussing, this term refers to heterogeneous data sources that need to be identified and normalized before the compute can occur. In data science, this is often referred to as data cleaning, this operation is frequently the most labor intensive as it involves all of the pre-work required to set-up the high-performance compute. This is where the vast majority of errors and issues are found with data and this is the fundamental bottle neck in high-performance computing.
Vocabulary: This term has two meanings. The first meaning is less a computing issue than it is a communication issue between provider and customer and it has to do with the language used to describe the desired outcome of an analysis. For example, the term “accuracy” or “performance” may have different meaning in the context of structural engineering than it does in rendering animation. The second meaning branches into semantic searching and operations within a semantic space. Here we are dealing with controlled vocabularies (ontologies) that represent a specific definition but also a relatedness to another term. For example, the term “child” infers that it has a “parent” and so forth. This term architecture is very important when operating with clients in the artificial intelligence space where search and retrieval is used to uncover unknown relationships. As it turns out, the strength of the ontology is what leads to the relative success or failure in projects that mine with semantic-based technologies.
Vagueness: This term describes an interpretation issue with results being returned. Douglas Adams articulated this beautifully in the “Hitchhiker’s Guide to the Galaxy” where the answer to all questions in the galaxy was postulated to be the number 42. This is a bit tongue-in-cheek, but, it is a very real problem with scientific and big data computes. These computes are able to marshal and transforms huge oceans of data but what does it mean? What do I do with the answer. We see the same issue in statistics when we do correlation studies. A famous example is the direct correlation between sales of chocolate ice cream and violent crime in Cleveland. So, what does this mean, does this mean that there is something in chocolate ice cream that makes people violent? As a well-meaning city official, you might consider banning the sale of chocolate ice cream, but, you’d look foolish, here’s why. Correlation does not imply causation, as it turns out, both ice cream sales and violent crime spike in the summer due to heat and lack of central air conditioning. This is vagueness. Computes that produce correlations are often misinterpreted as causation, more data doesn’t necessarily mean better or more accurate results, this is something that we all need to keep in the back of our minds when dealing with clients.
Viability: This refers to a model’s ability to represent reality. Model’s by their very nature are idealized approximations of reality. Some are very good, others are all dangerously flawed. Frequently, model builders simplify their models in order for them to be computationally tractable. With hardware acceleration, we can remove these shackles from the model builder and let them simulate closer to reality.
Value: This term is defined as whatever is important to the customer. Another way to define value is the removal of obstacles in their path to allow them to get to their stated destination. We often think of value in terms of cost, but, we can also think of Value in terms of enablement and what that is worth to the customer.
Here are some relationships between these terms that might be helpful…
As the first six V’s increase for any given problem, the problem outstrips the ability and capacity of commodity hardware and leads to a decrease in Viability and Value from that compute on commodity hardware.
Hardware deals primarily with Volume and Velocity as these are physical constraints of the data.
Software deals primarily with Variety, Veracity, Vocabulary, and Vagueness as these are logical or organizational constraints upon the data.
Artificial Intelligence/Machine Learning can be described as any technology that contains logic that discriminates between two or more classifications (member or non-member, odd or even, etc.) These systems deal primarily in the area of controlling or limiting Vocabulary and Vagueness and add Value and Viability through this control.
From these eight V’s and their relationships to hardware, software and artificial intelligence/machine learning we now have a lens though which we can examine our customer’s requirements and determine a measure of Value for the service that we provide.
We’re Attending the OpenPOWER Developer Congress — Here’s Why You Should, Too. Insights from Nimbix, Mellanox, and Xilinx
Prominent OpenPOWER Foundation members have provided the reasons they’re taking time out of their busy days to support the OpenPOWER Developer Congress and send their experts and team members.
This is why YOU should attend too!
Nimbix Enables On-Demand Cloud for Developers
Why Nimbix is Participating in the OpenPOWER Developer Congress
As the leading public cloud provider for OpenPOWER and Power systems, Nimbix has embraced its role as a member in the OpenPOWER Foundation. Nimbix enables ISVs to get their applications ported and running on the Power architecture, and feels a responsibility to help the OpenPOWER community. This is what the company signed up for when it became a Silver-level member of the OpenPOWER Foundation.
Nimbix works to grow the Power ecosystem for application software and broaden the software portfolio on OpenPOWER. It facilitates this by:
- Providing ISVs and developers a Continuous Integration / Continuous Deployment (CI/CD) pipeline to deploy their source code on Power.
- Providing the ability to not just port, but to test at scale, on a supercomputer in the cloud that runs on OpenPOWER technology.
- Enabling ISVs that decide to go to market with their applications in the cloud to sell those applications directly in the Nimbix cloud.
What is Nimbix Bringing to the Developer Congress?
“Nimbix is proud to support the OpenPOWER Developer Congress by providing resources to support Congress activities,” said Leo Reiter, CTO of Nimbix. “Through our support, we will be enabling the on-demand cloud infrastructure for the Congress so that all of the sessions and tracks can do their development in the cloud on the OpenPOWER platform.”
Leo will be part of the team instructing cloud development and porting to Power tracks at the Congress. “As an OpenPOWER Foundation member,” Leo said,, “I will be working with participants to get their applications running on Power in the cloud and providing them with tips and tools they can use to continue developing OpenPOWER applications post-conference.”
Mellanox Educates on Caffe, Chainer, and TensorFlow
Why Mellanox is Participating in the OpenPOWER Developer Congress
Mellanox is not only a founding member of the OpenPOWER Foundation, but also a founding member of its Machine Learning Work Group. AI / cognitive computing will improve our quality of life, drive emerging markets, and surely play a leading role in global economics. But to achieve real scalable performance with AI, being able to leverage cutting-edge interconnect capabilities is paramount. Typical vanilla networking just doesn’t scale, so it’s important that developers are aware of the additional performance that can be achieved by understanding the critical role of the network.
Because Deep Learning applications are well-suited to exploit the POWER architecture, it is also extremely important to have an advanced network that unlocks the scalable performance of deep learning systems, and that is where the Mellanox interconnect comes in. The benefits of RDMA, ultra-low latency, and In-Network Computing deliver an optimal environment for data-ingest at the critical performance levels required by POWER-based systems.
Mellanox is committed to working with the industry’s thought leaders to drive technologies in the most open way. Its core audience has always been end users — understanding their challenges and working with them to deliver real solutions. Today, more than ever, the developers, data-centric architects, and data scientists are the new generation of end users that drive the data center. They are defining the requirements of the data center, establishing its performance metrics, and delivering the fastest time to solution by exploiting the capabilities of the OpenPOWER architecture. Mellanox believes that participating in the OpenPOWER Developer Congress gives the company an opportunity to educate developers on its state-of-art-networking and also demonstrates its commitment to innovation with open development and open standards.
What is Mellanox Bringing to the Developer Congress?
Mellanox will provide on-site expertise to discuss the capabilities of Mellanox Interconnect Solutions. Dror Goldenberg, VP of Software Architecture at Mellanox, will be present to further dive into areas of machine learning acceleration and the frameworks that already take advantage of Mellanox capabilities, such as Caffe, Chainer, TensorFlow, and others.
Mellanox is the interconnect leader in AI / cognitive computing data centers, and already accelerates machine learning frameworks to achieve from 2x to 18x speedup for image recognition, NLP, voice recognition, and more. The company’s goal is to assist developers with their applications to achieve maximum scalability on POWER-based systems.
Xilinx Offers Experts in FPGAs and Machine Learning Algorithms
Why Xilinx is Participating in the OpenPOWER Developer Congress?
Xilinx, as a Platinum-level member of the OpenPOWER Foundation, looks forward to supporting the Foundation’s outreach activities. Xilinx particularly likes the format of the upcoming OpenPOWER Developer Congress, because it’s focused on developers and provides many benefits developers will find helpful.
Xilinx appreciates the unique nature of the Congress, in that it provides developers the opportunity to get up close to the technology and in some cases, work on it directly. It also allows developers to make good connections with other companies who participate in the Congress — something that can be very beneficial as they return to their day-to-day work.
Companies that choose to participate by providing instruction at the Congress get an opportunity to talk with developers first hand, and receive feedback on their product offerings. Conversely, the developers have an opportunity to provide feedback on products and influence what platforms (everything OpenPOWER) are going to look like as they mature.
What is Xilinx bringing to the Developer Congress?
Xilinx will be bringing system architects and solution architects who will work hands-on with developers to create solutions and solve problems. These experts understand both FPGAs and machine learning algorithms, which fits nicely with the OpenPOWER Developer Congress agenda.
Learn more about the OpenPOWER Developer Congress.