Organizations are using event-driven architectures, Apache Kafka, Apache Pulsar, and other streaming technologies to handle the glut of event messages they must process on a regular basis.
There’s no question that organizations are streaming more real-time data than ever before. One industry survey adds some numbers that illuminates how much growth there has been. Six out of ten organizations process an average of more than 100,000 messages every second through their streaming data solutions. Five out of 10 (21%) process more than 1 million messages per second, two percent process 10 million messages or more every second.
What kinds of technologies are being employed to move all these messages? All organizations surveyed are using event-driven architectures, along with other streaming technologies. On average, the surveyed organizations use three different open-source streaming technologies, with Apache Kafka being the most popular one used by 87%. Apache Pulsar is used by 17% of these organizations. Other popular open-source projects used for streaming data solutions by these organizations include Apache Cassandra, Apache ActiveMQ, and Apache Spark Streaming. Python and Java are the most prevalent languages employed.
Event-driven architectures are used on average for more than four purposes, with the most important being for data pipelines, messaging, microservices, data integration, and stream processing. Seven in 10 organizations (70%) build their own custom streaming data solutions, while 30% choose to source commercial applications or cloud services when available.
With real-time analytics on the rise, there’s no time to waste. Almost half (48%) of the respondents continuously
analyze streaming data in-stream, in real time, before persisting it to a datastore or data lake. Also, 48% of all
organizations said they include batch data in their streaming data analysis for contextual insights. However, only 16% of them today perform the contextual analysis in-stream in real time, as the data arrives, while 32% perform contextual
analysis only after the streaming data has been persisted. In other words, while 48% derive contextual insights from
streaming and batch data, only 16% today do this in real time today, i.e. before streaming data risk becoming stale.
Here’s the challenge: “Most event-driven data is only ephemerally useful,” the report’s authors point out. “That means its ability to generate insights, support human decision-making, or enable automated responses related to the ‘Now’ evaporates very quickly, after which it just turns into stale data. In fact, one of the biggest concerns data leaders have in collecting data is that no one in their enterprise will ever use all of that data, or understand its
value, to drive business decisions.”
To be able to drive timely and contextual decisions, “organizations have to have the ability to understand real-time and static data at the same time, with no delays. They have to be able to generate insights for decision-making continuously, not in batches. This Is the final frontier for streaming data.”