“Image Data Exploration (IDEX) II Work Station”​ by Ryan Somma is licensed under CC BY-SA 2.0

Operational & Analytics workloads — Part #2 CosmosDB & Azure Data Explorer

Michele Arpaia

--

In the previous post, I tried to illustrate why you could make better business decisions if you get into the habit of thinking at business problems without separating transactional and analytical workloads ( pedantic note: separation does not rule out distinction ;-)). In fact, in many instances they go hand in hand.

In this instalment, we will see a concrete example of how we can technically accomplish such “integrated view” by looking at the way Azure CosmosDB and Azure Data Explorer (ADX) work together. Just a few words about ADX.

ADX is a big data analytical database, designed for low latency near real-time analytics scenarios. It works as an append-only data exploration engine. Compared to traditional analytics solutions, here we have a system that ingest raw data (structured and unstructured), allows you to query it, and explore patterns, trends, etc. And it scales to terabytes of data, in minutes, allowing rapid iterations of data exploration to discover relevant insights.

Let’s look at the integration now.

Fundamentally, the key to achieving near real-time analytics is to loosely couple the data analysis from the online transactional systems. For instance, picking a scenario we alluded to in the previous post, customers buying products online, experience an ad hoc promotion as soon as they check out. In this case, the recommendation has to work as fast as possible without affecting the transactional performance at all.

That’s what we got with ADX! the ability to query fast-flowing data without affecting the OLTP system’s performance AND keep warm data that is frequently accessed. Cold data can then easily exported out to a low cost cold storage mechanism such as Azure BLOB.

On the other hand, Cosmos DB is an operational hot store where data can be stored for few days. As soon as data changes e.g. the check out process we have mentioned above, CosmosDB will then flow this data to ADX. Let’s take a look at the two prongs of the architecture.

The magic of the integration in a loosely manner occurs by leveraging a CosmosDB built-in feature called Change Feed. This is a mechanism to listen to all the changes occurring in a collection. For instance, a customer checks out, a change related to his profile occurs, and this is logged and available for consumption. In fact, you can have an Azure Function, a serverless way to run code based on events, to actually process and push the data out. To move the data to ADX, an Event Hub s then can be used to ingest all the events (changes of the data) and make it available for the analytical platform i.e. ADX to analyse it at a lightning speed.

The whole reference architecture and much more details are discussed on this very useful lab is on Github.

In the next and last post of this mini-series, we will look at more use cases and integration between CosmosDB and Azure Synapse.

In the meantime, enjoy Ignite!

Originally published at https://www.linkedin.com.

--

--

Michele Arpaia

Software Architect, Designer, Programmer, Philosopher.