Nimbix Blog

Super musing about all things supercomputing

Cloud Data Comes in 3 Flavors

Written by: Leo Reiter on October 21, 2014

Whether Big Data or traditional, structured or unstructured, Cloud Data can be categorized into 3 primary forms…

cloud data

Aggregate Cloud Data

This form of cloud data, whether public, hybrid, or private, is especially popular with multi-site organizations.  Generally, data collected in different locations needs to be processed at some point.  Using the cloud to aggregate data in a common location makes good sense for future processing or analytics.

Aggregate cloud data has actually been around longer than many would think – even before we thought of cloud computing as we know it today.  A very popular use case is found in organizations with branch offices, such as retail and financial firms.  Branch-level processes upload transactions or other information on a regular basis.  This used to happen periodically but can now be more real-time thanks to widespread availability of cloud data services.  Once aggregated, higher level reports and analytics become possible, such as comparisons between branches or regions.

What the cloud enables is a more seamless migration of data.  In the past, client/server models were popular.  Today, cloud data storage is API-driven and in many cases widely distributed across geographies.  This makes it possible for remote branches to store information with minimal latency, knowing that the cloud data services will seamlessly present a unified view of the storage from any other location.

Anonymized Cloud Data

In many cases, information may be too sensitive to store in the cloud.  On the other hand, Big Compute in the public cloud is rapidly emerging as a way to process massive amounts of information without expensive capital and labor investments in infrastructure.  A great hybrid cloud use case is to filter out sensitive information from data sets shortly after capture, then leverage the public cloud to perform complex analytics on them.  For example, if analyzing terabytes worth of medical data to identify healthcare patterns and predict susceptibility to disease, the actual patient identities are not too relevant.  In this case a filter can “scrape” names, addresses, and social security numbers, for example, before pushing the anonymized set to secure cloud data storage.  From there, High Performance Data Analysis in the public cloud can make short work of the analytics.

Filtering as an early step in Big Data/Analytics workflows can also greatly reduce the amount of data to process.  This makes both data transfer and computation speed faster, reducing costs along the way.

Data Originating in the Cloud

A recent study from Nasuni estimated that there’s over 1 Exabyte (more than 1 billion gigabytes) of cloud data already.  While this 2013 measurement included private sources as well, Gartner predicts that by 2016, more than half of all Enterprise data will live in the public cloud.  This shouldn’t be a surprise, considering the popularity of public cloud in the CRM space alone.  But it’s not just customer information and sales metrics…  Think of social media and how it benefits predictive analytics – online behavior patterns can be great indicators of purchasing habits, for example!  And let’s not forget the massive amounts of reference data that grows by the minute.

The real beauty of data originating in the public cloud is how accessible it is.  All cloud data services have APIs, and in many cases, either full or filtered information is freely available (again, very common with social media).  In the not too distant future, it really won’t matter where the data comes from, as long as it’s accessible in the public cloud.  Analytics platforms can already process unstructured data from multiple sources at the same time.

Just imagine the insights we’ll be able to derive about society, humanity, and even the universe thanks to the ever increasing volume of cloud data!