Watch Manish, Product Executive, and Dinesh, Industry Analyst, talk about how enterprises create value from their data-in-motion by using the power of fresh, real-time data.
With organizations now focusing on building solutions using AI, it is pertinent to understand that the success factor for most of these applications is going to be the data that serves them. But, specifically, does data-in-motion (DiM) have a key role to play in the value creation?
Host Dinesh Chandrasekhar talks with Manish Devgan, both seasoned data management consultants, about how enterprises create that value at scale.
Some of the topics Manish and Dinesh cover are:
- The modern DiM ecosystem map
- Defining DiM and data at rest
- Creating value from data-in-motion
- An IoT DiM use case
- “Perishable insights” or fresh data, i.e. real-time data
Guest: Manish Devgan, a product executive with a proven track record of delivering industry-leading software. He has successfully led the development of numerous products at companies such as BEA Systems, Oracle, Terracotta, Software AG, and Hazelcast. Manish is an innovator and holds several patents. Most recently, he was Chief Product Officer at Hazelcast, where he helped build a category-leading real-time data platform for application builders leveraging streaming data, low-latency datastore, and real-time ML/AI. In his free time, Manish likes to go on long walks while listening to Punjabi music.
Host: Dinesh Chandrasekhar is a technology evangelist, a thought leader, and a seasoned IT industry analyst. With close to 30 years of experience, Dinesh has worked on B2B enterprise software as well as SaaS products delivering and marketing sophisticated solutions for customers with complex architectures. He has also defined and executed highly successful GTM strategies to launch several high-growth products into the market at various companies like LogicMonitor, Cloudera, Hortonworks, CA Technologies, Software AG, IBM etc. He is a prolific speaker, blogger, and a weekend coder. Dinesh holds an MBA degree from Santa Clara University and a Master’s degree in Computer Applications from the University of Madras. Currently, Dinesh runs his own company, Stratola, a customer-focused business strategy consulting and full-stack marketing services firm.
Resources:
View the data-in-motion ecosystem map here
Learn more about data-in-motion on RTInsights here
Transcript
Dinesh Chandrasekhar:
Hello, welcome to yet another episode of Data-in-Motion Leadership Series, a smart talk with one of the data leaders in the space today. We welcome Manish Devgan, one of my good friends and ex-colleagues from a previous life.
Manish Devgan is a product executive with a proven track record of delivering industry-leading software. He has successfully led the development of numerous products at companies like BEA Systems, Oracle, Terracotta, Software AG, which is where we both worked together, full disclaimer, and Hazelcast.
Manish is an innovator and holds several patents. Most recently, he was the chief product officer at Hazelcast, where he helped build a category-leading real-time data platform for application developers, application builders, leveraging streaming data, low-latency data store, and real-time ML and AI.
In his free time, Manish likes to go on long walks while listening to Punjabi music. All right, that’s a good background. Manish, so thankful that you have joined us for this episode. Thank you so much for joining in this conversation. Welcome.
Manish Devgan (01:26):
Yeah, thank you, Dinesh. Glad to be here.
Dinesh Chandrasekhar (01:30):
We started the series primarily to discuss with data leaders like you, primarily to talk about data-in-motion. Data-in-motion, again, we’ll go into a little bit of the semantics of why it’s called data-in-motion, but in the context of some of these broader applications that data is being used in, and particularly the topic for today, pertaining to GenAI type of applications, and particularly with your background with ML and AI and real-time data platforms, I think it makes super sense for you to be a guest on this particular episode.
I’m so excited about talking about this particular topic. I want to start off this conversation talking about data-in-motion and data-at-rest. I think this is something that has been around our circles for quite a bit, and I want to draw the distinction between these two. Again, going back a few years from some of the companies that I worked with, we drew that distinction very clearly about what is data-at-rest versus what is data-in-motion?
Most recently, a few months ago, I put together this ecosystem map that draws out this picture of how I look at it from my POV, and I want to share that with you and ask about your perspective on how you look at it as well. Let me just share my screen real quickly.
If you look at the screen, the way I’m looking at this entire ecosystem of vendors and their offerings, and how they are offering their different products in this particular space, is how the entire data flows through the system. For example, it could be we are talking about different types of data, unstructured data, structured data, data streams and so forth, all of them coming through some kind of a pipeline mechanism.
The more modern these data pipelines are getting, we are talking about not only the capabilities to connect the data, connect to the different data sources, ingest them, process them, analyze them in real-time and so forth.
Then, as we start doing that, now is where the distinction starts becoming very clear as to what you do with the data. Is it data that is in a state of constantly incoming into the enterprise that you’re ingesting, as well as you wanting to leverage that in real-time, and analyzing it in real-time, which makes it this particular branch, which is data-in-motion, or this other aspect where you say, let me collect all this data, let me put it into some kind of a data warehouse or a store, where I can then process it at a later point for more meaningful analytics and so forth.
Looking at the big picture, rather than what is immediately happening at this second, at this very moment. There is this notion of data-at-rest and data-in-motion, where the data lands versus what you’re doing with active data that is still coming in through the enterprise. Then eventually, all those lead into these different applications at the top that has been represented here, whether you’re talking about data applications, data science, ML-based applications or so forth.
I want to pause here. I want to say that, as we look at the different types of persona that are using these mechanisms to access the data, how do you look at it? How do you look at this picture? Or maybe from your own POV, from your own experience, what is data-in-motion to you, and how do you look at this particular landscape?
Manish Devgan (04:51):
Yeah. That distinction is interesting. I think that distinction also shows how we have come here and the journey behind that. It’s interesting because data is in motion when it is born, always. A customer record is created when a customer gets acquired, or is acquired. These, in some ways, are just intermediate states. In the past decade or so, the software tech stack has allowed people to tap into fresh insights as the data is moving.
I think that’s what is new. This is again, over and above the data analytics tools that have been serving the data addressed. For example, a database, a data lake, data warehouse, and more recently, a lake house. What’s interesting here is for the data-in-motion, what I call fresh data, is that transport layer. There is a lot of new products which actually serve as a transport layer for data-in-motion, like Kafka and Redpanda, and then also the advent of stream processing systems like Flink.
Those have become mainstream. That’s the reason why people have been able to tap into the value of this fresh data. Especially also, I see the new technologies like streaming databases and real-time analytic technologies like PachiPino. There are many tools and technologies that help make decisions, better actions, better analytics, all leveraging this fresh data and data-in-motion. I think that’s what has changed.
Dinesh Chandrasekhar (06:29):
Fantastic. Now, I love that now description right there. You just touched on something, which is data freshness. The freshness of the data is super relevant. When I talk about data-in-motion, I think that’s something that I’ve talked in the past as well, the value of the data and how data decays over a certain time, or at least the value of the data decaying over a certain time and all that.
I think that’s a very important factor to consider when you differentiate between data-at-rest versus data-in-motion. Thank you so much for that. If you want to elaborate on that, if you say what is the importance of data-in-motion in the terms of the value it creates, can you maybe elaborate a little bit more?
For example, when I think about, let’s say, IoT in the broader context, we are talking about thousands of devices emanating all these different sensor data that is coming in through, in every few seconds or even every second and so forth. You’re collecting all this data, and this data means something in that particular moment, and the freshness of the data matters a lot.
It may be a thing about predictive maintenance in the context of manufacturing, and you look at this data and you go, this particular machine is overheating, let me just look at it in real time. There is a notion of how you’re able to proactively, predictively say that something is going to fail, and that boils down to the fact that the freshness of the data has created that value for you.
Do you see at it the same way in your experience, looking at real-time data platforms and so forth? How are you looking at it? Let me just stop the screen so we can get back to talking.
Manish Devgan (08:13):
Yeah. No, thanks. I think that’s a really important question, the importance of data-in-motion, in terms of the value it creates. The way I look at it, and I think this is … I’ve seen in many customers, the more recent the data, the more value it has. There is a term called perishable insights. The more time passes, the more it feels like you’re making a decision looking into the rearview mirror.
In broad brushstrokes, you’re essentially making tomorrow’s decision based on today’s data, if you’re ignoring data-in-motion. I remember working with a customer, one of the largest machine builders in the world. They build giant, these paint robots for the assembly line of high-end cars. Now, based on the current data coming in through the pressure sensors on that robotic arm, the predictive model can shut the process before it is too late, and the paint job does not go bad.
You know the process of painting on a high-end car is so expensive. There is a huge value of fresh data. That’s where data-in-motion is so very critical for these kind of use cases.
Dinesh Chandrasekhar (09:26):
We probably went a little back and forth on a couple of things. One is data-in-motion, the freshness of the data, and I loved your description on perishable insights, which is absolutely important. Then, there is a notion of real time as well that, when it comes to data-in-motion. How do you see that correlation? Again, this goes back to a lot of examples. For example, healthcare, when you talk about patient telemetry, there is that importance of real-time data, because you’ve got all these biometric devices attached to patients, and suddenly, if something is off, you need to be able to detect that anomaly, understand the false positives, and be able to proactively, predictively tell the doctors or send the alerts and say, this particular patient is in need of healthcare right away.
How do you look at that kind of distinction, or they’re one and the same, or how do you view that particular correlation between real time and data-in-motion?
Manish Devgan (10:23):
Yeah, that’s another good one. The term real time is an interesting one, right? The actual engineering definition, how the builders and developers think about it, is often when one can respond with predictable latency. That is, there is a deterministic timeline. That’s what real time means to a developer or a builder.
For business, real time simply means fast. It’s within the boundaries of the business SLA. It depends on what your business SLA is. If your business SLA for a fraud detection system is five milliseconds, that is what is considered real time. For other businesses, it may be different.
How I see the real-time data platforms, the data-in-motion really corresponds to fresh data, like we talked about. If you look at the examples for some of these use cases, which I’ve encountered, like for IoT, it could be data from connected products, like cars, compressors, robotic arms I talked about.
Then, in a simple data ecosystem, it could be CDC from a database log, it could be transactional data from point of sale, it could be location data, or any data coming across the enterprise messaging system like Kafka. All that is really useful in a real-time data platform.
That’s what data-in-motion is typically. Those are the examples of data-in-motion.
Dinesh Chandrasekhar (11:55):
There is also a little confusion or maybe lack of clarity around this term, real-time data platform. Maybe while we are on this topic, can we just touch on that before we move onto other areas? What is your view of what a real-time data platform is? Because I have sold, marketed real-time end-to-end streaming platforms and so forth, and then there is this notion of real-time data platforms that has been redefined by different analysts in different contexts, and all that.
The general broad sense is that, it has the capability to do data ingestion, it has the capability to do real-time data processing of the streams, and it has the ability to create those real-time insights, based on the processing that it has done. Is there a different way to look at it, or are there other checkpoints that we need to look for in a real-time data platform? What do you think is your definition or view of the real-time data platform space?
Manish Devgan (12:57):
Yeah. There are platforms, and then there are also these things called solutions who are … Solutions are mostly systems who are actually targeting a particular vertical, for example. Platform is more in a general way, where you can build in multiple domains.
Some of the critical capabilities are the ones you mentioned, like a real-time data management capability, a capability to process stream, a capability to actually apply real-time machine learning algorithms, and things like that. I think sometimes, you don’t really need stream processing per se, in a real-time data platform, because sometimes, it’s just about latency. It’s a request/response system, and I want to get to know …
I want to pull up that customer record if I’m in a call center, while I’m speaking the first sentence with my customer, that quickly. For that, you don’t really need stream processing, you just need a quick way with predictable latency at scale, to be able to pull up their customer information. That is real time for a lot of use cases.
I think it depends, I would say, but yeah, this is an area where, again, there’s a huge value of fresh data in a real-time use case, a huge value.
Dinesh Chandrasekhar (14:23):
Got it. Awesome. No, that’s fantastic. Thank you. It is important to get these different perspectives, and this is what I’m trying to do with these episodes, gathering these kinds of insights from data leaders like you, because everybody does bring in their unique value into how they look at it from their vantage point. I totally appreciate what you just said.