RTInsights sits down with Rohit Choudhary, Founder & CEO at Acceldata.io, to discuss data challenges enterprises face today and how data observability can help.
As enterprises strive to become more data-driven, they need real-time insights into their data, compute, and data pipelines. Traditional monitoring tools do not provide the needed information to truly assess and manage the quality and health of enterprise data. What businesses need is an approach that helps them understand and control all aspects of their data and systems. Specifically, what’s needed is data observability.
RTInsights recently sat down with Rohit Choudhary, Founder & CEO at Acceldata.io, to talk about the data challenges enterprises face today, how data observability can help, and the benefits that can be derived using data observability. Here is a summary of our conversation.
RTInsights: Briefly, what is data observability?
Choudhary: Data observability is something that helps modern data enterprises deal with the complexity that so many different data systems bring into the enterprise. It gives them the ability to control the quality, reliability, and performance of their overall data and data systems.
See also: Multidimensional data observability resources
Different enterprises use it for different purposes. Some enterprises are struggling with getting compute performance resolved as they introduce newer technologies or deal with talent shortages. So, they’re looking to automate a lot of the mundane stuff that comes with dealing with so much of the complexity of modern data systems, and therefore they’re using data observability for those purposes.
Slightly more advanced enterprise customers are using data observability for more advanced use cases, such as data quality and reliability and operational data governance.
But the whole premise of data observability stems from the fact that every enterprise is trying to become data-driven. Every enterprise is dealing with more complexity and more data, and nobody has the talent to manage data usage effectively at the scale that’s necessary. So, to achieve data-driven transformation goals, data observability is a very, very useful navigation map. It helps companies transition to being truly data-driven.
RTInsights: Why do businesses and data teams need data observability?
Choudhary: Customers have gotten used to next-best actions, recommendations, getting real-time insights about what they should do and how they should act on their own choices. People within enterprises expect a similar experience when they do their job. Everybody is looking at certain elements of data analytics and operational insights to perform their day job.
If data needs to be available for all these business consumers and people who work within the enterprise, then data needs to be of very high quality. It has to be reliable. And, it has to arrive on time.
So, everybody is now dependent on data on an operational basis. It is not something that will be run once a week, creating an analytical report, sent to the execs. Instead, data is getting used by everybody who is in the modern enterprise today.
And so, therefore, if you want to be a reliable business, a complete business, a business that is actually acting on the latest data insights that are coming in, you need data observability. Why? Data observability is the layer that ensures that all the big objectives that the business is setting itself up for are achieved over a period of time. That is the biggest benefit.
What it also does is act as a sort of glue between three different user groups within the enterprise. They include the lines of business, those operating data systems, and the people who are the practitioners, whether it is data analysts, data scientists, or people who are looking at operational and analytical insights from the data that the enterprise has collected.
With data observability, you have a common vocabulary for understanding the business in terms of the different workflows that are operating on the collected data and how the data is getting processed, transformed, and eventually used.
This glue allows everybody to develop a common vocabulary so that each of these different groups, which have so far acted very differently when it comes to topics of data, are able to see eye-to-eye and resolve issues as they find them.
RTInsights: Can you explain the evolution of data observability and why it’s so critical in the modern data stack?
Choudhary: If you look at the evolution of IT systems, the big changing point was probably the year 2000, when everybody realized that the whole world was going online. And ever since then, the deployment of applications that took place increased at a radical pace between 2000 to 2010. Then, between the years 2010 and 2015, what we saw was that newer, data-intensive applications started coming.
In the 2010s, applications were monitored using application performance monitoring tools. And those were focused on ensuring that the user workflows, whether it was the online application for a loan or booking tickets, accomplished their end goal. Those are slightly simpler systems to monitor, and APMs did a fantastic job in terms of monitoring.
Now, when you think about data systems, there’s been a fundamental change. Whether it is recognizing what are the disease patterns in the U.S.? How is the pandemic progressing? How many people are buying size nine Nike shoes? These are complex questions. The answers are deeply hidden under several layers of data and data that needs to be correlated. So, there is no simple workflow. There’s no end goal because the data keeps coming. It is waiting for these insights to appear in front of you so that you can act from a business point of view.
Now, enterprises are applying operational and analytical techniques to vast quantities of data, which are getting collected from all over the world and from disparate sources, including user clicks, ERP system data, user-generated comments, advertising clicks, and more. What they find out is that the tools that they have at their disposal are basically rudimentary. They were generated in the open-source world, and they don’t have the 30-year-old backing of technology such as Oracle or others that offer technology that has been proven to work at scale.
What you find is that the open-source community-based tools are excellent because they’re bringing all the breakthrough innovations, but they’re still very early in their own maturity cycle. And, when you’re trying to manage all of these technologies, you find that it’s very difficult.
I’ve seen from my own personal experience that many companies have fallen back onto APM tools to figure out how to monitor data systems. But what you end up finding is that the APM tools are not sufficient and certainly not enough to discover, uncover, find out, and resolve the issues that occur in these complex data systems.
And that is what we are looking at for the evolution of this market. It’s still monitoring but monitoring at a very granular and very different level.
So, I would say that this has been the evolution. A lot of companies have toyed around with open-source platforms and frameworks. However, it’s getting harder and harder to use these tools because it’s almost like a multi-cloud deployment war where everybody has so many different kinds of systems, and they won’t be able to accomplish the monitoring and the end-to-end visibility that enterprises need with APM tools.
RTInsights: But don’t other tools like APM help data teams in these areas?
Choudhary: They do help, but they’re actually very abstract in the sense that an APM can tell you how you are doing at an overall system-level or at a service level. But unfortunately, data, insights, and compute complexity are hidden under several layers. You need to know more than the obvious, which is whether a service or server is up or not. That’s APM and system metrics territory.
Data observability is very different. Just like APM, where people got accustomed to metrics, logs, and traces all being together, data observability and data systems require quality, reliability, and performance to be monitored together. So, it’s a very different paradigm. It’s a different world.
For example, APM systems will not satisfy the curiosity of, let’s say, a data engineer who’s trying to find out why his data has not arrived. Because, when he’s looking at a problem, it is of the nature, “Okay, all my data is delayed,” or “My data did not arrive at 9:00 AM. Although I expected it to arrive at 9:00 AM.” The answers cannot be found through an APM. The answers have to be found deeply embedded inside the lineage of processing and the performance of the commute engines that you have, especially in a hybrid multi-cloud world. And, under those circumstances, APMs just don’t work.
RTInsights: How does Acceldata help with data observability?
Choudhary: We took a holistic approach. We saw that there were these adjacent concerns between these different layers that I spoke of earlier, which are the operations, the practitioners, and business. And all of these groups are driving towards the same thing, but their concerns are slightly different.
If the operations teams are not happy and are not able to provide the SLAs and SLOs, which most of the data teams are dependent on, then the outcomes that you hope to achieve with data are not going to be met.
As far as the practitioners are concerned, they’re looking for data that is of high quality and can be relied upon to make good business decisions because using bad data will end up with inaccurate decisions.
And finally, businesses need to be able to state with confidence that all the workflows that they initiated and put in place are driving towards the goal of data-driven transformations. For that, they need reliability in the workflows and the orchestrations that they’ve put in.
So, we have a way by which we can identify and monitor the pipelines that move data from the point of origin all the way to the point of consumption. We have the ability to go and monitor what is flowing inside those data pipelines, which crisscross the entire enterprise.
And finally, we have the ability to tell you what the health performance characteristics of the underlying compute systems are. So, we are the most comprehensive data observability system, which is something that we call multidimensional data observability. And just like the APM tools did metrics, logs, and traces, we are doing quality, reliability, and performance for data systems. Many of our customers have benefited from this whole paradigm immensely.
RTInsights: Can you discuss use cases and give examples of the benefits derived?
Choudhary: We work with the Fortune 500 and potentially the Global 2000 of the world. Customers include Oracle, Pratt & Whitney, Pubmatic, a Walmart subsidiary called PhonePe, and more. The biggest benefit that these companies have achieved with us is the reliability of both their data and their data systems. Some examples include:
PhonePe has approximately 250 million daily active users. On peak days, they process over two billion in cash. It’s a peer-to-peer cash transfer mechanism on which you have over 200,000 merchants accepting payments from mobile apps. Their biggest concern was that they wanted their OLTP and OLAP systems to be performant and reliable.
And they were actually fighting a pretty tough contest between a company called Paytm and Google Pay. And they ended up winning because they were able to insert themselves as the most reliable phone application that allows direct cash transfer instantaneously.
We allowed them to focus on developing their business use cases, as opposed to focusing on reliability issues on both the operation system side and also from a data reliability standpoint. So, we covered all the bases for them, and they managed to expand and completely take charge of their business, as opposed to focusing on just technology and solving day-to-day issues.
Now, the hidden fact amongst all of this is that the most important and the most special community within the enterprise is the data engineering group. If the data engineering group is spending an inordinate amount of time fixing day-to-day operational issues, they are actually taking away from the future of the company. All CEOs and CFOs should be focused on how to make their data engineering teams more productive and effective.
Another use case where we see a lot of interest is in getting the right quality of data into the cloud data lakes and data warehouses right from the start. We work with extremely large financial data companies and insurance data companies, which provide insurance data or financial data to most enterprises across the world. One of the biggest things they’re trying to do is to basically validate the different data sets that they get on an ongoing daily basis. They want to make sure that they are getting the correct form or version of data so that they can put accurate, compliant, and reliable data into their data lakes and data warehouses.
Now, the thing that has really changed, if you look at the data quality paradigm itself, is that data quality used to be like a centralized, once-a-year objective run by the CTO’s office. But that is completely changed because operational quality and reliability of data is now a real-time operational concern, as opposed to once-a-year activity.
So, we are seeing data observability playing a crucial role in ensuring that data is reliable all the time, whenever you need it, regardless of how many systems and how many transformations it undergoes.
Of course, there are certain elements of data governance also, which can be addressed. And, to my earlier point, just like quality has become a real-time concern, likewise, data governance has become a real-time concern. Knowing that a breach of data has happened is an immediate concern for the enterprise, as opposed to waiting for three or six months and then acting on it. There’s so much data at stake right now that even minor leakages can have devastating effects on the overall enterprise.
So, most Chief Data Officers (CDOs) and CEOs are very focused on these issues. They’re looking to get as many insights as possible around all of these areas. And we’re working with several of these companies and ensuring that they’re able to achieve the best benefits that data can provide them, with a minimized amount of risk.
RTInsights: When should organizations think about implementing data observability?
Choudhary: As early as you can because retrofitting observability is always painful. I’ve seen multiple enterprises go through this pain. They let their data systems proliferate, and then it becomes a challenge for enterprises to retrofit something and make it work.
So, I would say, if you’re six months into your data journey, you should start thinking about how do we implement observability? How will we bring reliability, quality, and compute performance altogether, right from the start?