Businesses that seek to be real-time enterprises increasingly are turning to Apache Kafka to support data streaming and deliver significant value.
Every enterprise wants data analytics and the possibilities real-time data offers to their businesses. But real-time data streaming hasn’t been ready to go to the next level. Business leaders are still learning to trust or depend on the data they receive from data-driven analytic models to make their decisions. Everyone aspires to be data-driven, but few business leaders feel they’re anywhere near that goal.
Kafka, the open-source tool that supports data streaming, makes real-time analytics a reality at all levels of the enterprise. To explore the potential of Kafka, we recently sat down with Mark Palmer, senior vice president of analytics at TIBCO Software; Monica Cisneros, portfolio marketing manager at TIBCO; and Paul Varley, senior manager, product marketing at TIBCO, to talk about this tool that is bringing real-time into today’s organizations, and how it can deliver significant value to the business. Here is a summary of our conversation.
Why Kafka?
RTInsights: What makes Kafka preferable to other forms of message streaming, such as Apache Pulsar or RabbitMQ?
Varley: Apache Kafka takes a different approach to message streaming compared to most pub/sub systems. It uses a log file abstraction to persist data and track consumption. This makes Kafka simpler than some other messaging systems and therefore easier to scale, at least in one location.
With Kafka, administrators configure a Time To Live (TTL) that dictates how long the event brokers retain messages. The TTL can be minutes, hours, days, weeks, or longer. Consumer applications do not need to receive messages immediately and can even replay entire message streams long after the broker received them. This is difficult if not impossible in many other messaging systems, making Kafka preferable whenever stream replay is required.
Apache Pulsar is similar to Kafka. It is also open source and provides many of the same capabilities. Pulsar has advantages over Kafka, which include built-in support for multi-tenancy and data replication. As a result, companies that need message distribution on a global scale will find Pulsar less complex to deploy and manage than Kafka. Nevertheless, Kafka’s open source community is larger than Pulsar’s, and this could be considered an advantage of Kafka.
Democratize Kafka?
RTInsights: With the push to democratize Kafka, does that mean it will be a tool accessible to non-technical business users, or will it always require some level of technical expertise?
Cisneros: It really depends on what you want to do with it. We can make it as easy as a push of a button to get insights from Apache Kafka, zero code necessary, and no technical expertise at all.
Now, think about it in terms of solutions—the more customizable or unique you want to make your experience, it is always good to have at least some technical knowledge or expertise. The more we learn about a system, the more we can do with it—we can get innovative and creative with those solutions.
The most important thing is defining your needs and objectives and going from there. We will always be here to back you up in those endeavors and by providing a platform that adapts to your needs, whatever they are.
Kafka capacity limits?
RTInsights: Are there limits on the enterprise messaging capacity of Kafka? How have companies dealt with any surges in capacity?
Varley: Yes. There are limits on all messaging solutions. Like many other solutions, Kafka has the ability to scale both horizontally and vertically, but there are architectural choices to take into consideration, especially if you are looking at capacity and surges in capacity. For example, financial market data systems have had to address this for many years. Microbursts in trading can cause large volumes of data that need to be delivered over a very low latency. Forethought into the architecture needed to handle microbursts is the key to providing a communications framework that can scale whether it is based on Kafka, Pulsar, or any other messaging solution.
Apache Kafka allows topics to be partitioned across nodes and nodes to be added to help scale, as more throughput is needed. This provides a great approach to scaling the solution as demand grows. However, this also creates a challenge for geo-replication and the ability to provide data resiliency across multiple regions. Therefore, if you need both geo-replication and the ability to replicate data across multiple regions, you are going to need to invest in infrastructure that can support the maximum throughput you expect to see through a microburst.
Kafka with legacy systems & data silos
RTInsights: How does Kafka perform for organizations with a number of legacy systems or data silos? Can these be opened up effectively?
Cisneros: Data virtualization can help organizations break down data silos and bring together diverse, disparate data sources across the enterprise. This includes combining historical data and real-time streaming data, like for example Kafka, to centralize an organization’s data and open up access to those who most need it.
Data virtualization abstracts access to system-of-record data. Therefore, instead of duplicating it with ETL or a data lake, you leave it alone and access it as a single virtual store. Now, for the first time, TIBCO makes streaming data a first class citizen in data virtualization: By connecting TIBCO Streaming to TIBCO Data Virtualization you can connect 90 other real-time sources, including Apache Kafka, that impact the business, like weather, IoT sensor data, or drones.
Kafka & AI, ML
RTInsights: In what ways is Kafka enhancing artificial intelligence (AI) or machine learning efforts? Is AI becoming one of the prime use cases for Kafka?
Palmer: The ubiquity of Kafka creates the opportunity to apply AI models on real-time data. First-generation AI trains on historical data. For stable, non-volatile supply chains, this is fine.
However, for sensitive, volatile supply chains, real-time data is critical. By attaching machine learning algorithms to streaming Kafka data, new opportunities emerge. Predictions update in real time as new information flows through Kafka topics. Analysts can analyze live views of the supply chain and react quickly.
Kafka messages are like neurons that send supply chain sensory input. New streaming business intelligence (BI) tools, like our real-time analytics solutions, help decipher the meaning of those inputs, and new streaming data science tools update predictions accordingly.
Therefore, Kafka, Kafka analytics, and streaming data science make resilient, intelligent, responsive supply chains possible.
For more information on TIBCO solutions, please visit https://www.tibco.com/solutions/apache-kafka