To work with streaming edge data requires a modern data architecture built on next-generation databases, data pipelines, and streaming data frameworks.
For decades, database environments were designed and built to support batch-processing modes of computing. Data is extracted, transformed, loaded, and stored where it can be accessed by reporting and analytic applications. This worked well when organizations had analysts and executives who were satisfied with looking backward in time to see what types of events or anomalies occurred within their enterprises. The most timely data supported was often from 24 hours prior.
However, in today’s real-time era, batch-oriented data environments are woefully inadequate. Twenty-four hours is an eternity, as business opportunities or needs may slip by in a matter of minutes. Information is required instantaneously for next-generation applications such as artificial intelligence, machine learning, geolocation, sensors, IoT, and real-time spatial analysis. A modern data architecture built on next-generation databases, data pipelines, and streaming data frameworks is required.
Leading technologies supporting streaming data include data pipelines, messaging, and microservices. Stream processing frameworks that enable real-time data environments include a number of open source alternatives such as Apache Kafka, Apache Flink, Amazon Kinesis, Apache Spark Streaming, and Apache Storm, as well as enterprise offerings such as Dell Streaming Data Platform, HPE Ezmerald, and Databricks. In addition, databases supporting real-time applications include NoSQL and graph databases.
A majority of companies leading the way with data-driven cultures (58%) say their organizations have mature, real-time data access and analytics capabilities, according to the authors of a recent study of 311 data executives published by Harvard Business Review Analytic Services. In contrast, only 6% of companies (not categorized as data-driven leaders by the study) say they have mature capabilities.
Additionally, at least two-thirds of leaders (67%) say they have mature automation of data-driven insight using machine learning built into their workflows, compared to only five percent of the rest.
See also: Streaming Data Management for the Edge
“The fastest-growing area in terms of investment is to get to as real time as possible,” the HBR study’s authors state. Nearly a third of data analytics leaders (32%) say they have a broad ability to access data across systems and locations versus just 10% of other respondents. Looking forward to the months and years ahead, 73% of executives say the ability to access and analyze data in real time is extremely important to organizational performance and success. In addition, 72% say the ability to automate data-driven insight with machine learning built into workflows. A modern data architecture is required to support organizations’ efforts to move forward with improved profitability, increased innovation, or better customer experiences, the HBR study states.
Important enablers for data-driven enterprises cited in the survey include cloud services that allow companies to deliver new capabilities with existing skill sets/employees (83%), real-time analytics capabilities (81%), automated machine learning for predictive analytics (75%), and open architecture (70%).
It’s not just about wringing value from streaming data; it’s also a matter of moving it at a real-time pace to decision-makers and their systems. A separate survey finds that almost half of the organizations, 48%, analyze data in real-time instead of storing it first. In addition, 16% of the executives responding to the survey perform contextual analysis in the stream, lowering the time from streaming data entering the system to analysis even more. In other words, not an optimal environment for batch-oriented environments.
For example, there’s a matter of scale. Traditional batch-oriented data environments often choke from the volume of edge or IoT data that may be streaming in from a variety of sources. Streaming data tends to take on a variety of formats, and many traditional environments are not designed to handle data types beyond relational or structured data. Streaming data may have unstructured or even graphic formats.
Of course, the move to a modern data architecture that supports real-time streaming capabilities doesn’t happen overnight, and for many organizations is an incremental journey. The bottom line is traditional data environments can no longer support the demands of today’s organizations, which require increasingly sophisticated applications.