Why the Time for Real-time Is Now, And What to Do About It

PinIt

It’s not 2014 anymore. It’s 2024. Real-time analytics are available now. A decade of innovation has produced an Apache Pinot that is enterprise-ready and can even integrate directly with your Gen AI initiatives as a vector index datastore.

We are collectively experiencing a whirlwind convergence of technological evolution, strategic industry shifts and unrelenting business demands driving the rapid adoption of real-time analytics. Organizations are being asked to run ever faster and at larger scales, while also being leaner and more cost-efficient. Rather than wait hours or days for answers, users want queries answered in seconds or milliseconds. Rather than just support internal customers, organizations want to open up data access to partners, customers, prospects or even the general public. Oh, and also, you might even be asked to maintain or even cut your current operational budget — maybe even in half. On top of that, you need to implement and innovate right now, with implementation goals set within current fiscal years.

Let’s explore some of the common factors facilitating the transformation to real-time analytics and consider some of the most recurrent best paths forward.

Converging Factors Driving the Need for Real-Time Analytics

Technological Evolution

When the Pinot project first began within LinkedIn around 2014, the world of computing was radically different. The public cloud existed — indeed, it was already an $80 billion a year industry — but was not the behemoth of infrastructure that is available today. This year it was estimated to be a $675 billion industry, with a 20% year-over-year growth rate. Underlying that growth has been a multidimensional matrix of hardware advances, from chipsets — CPUs, GPUs, ASICs and SOCs — to memory, storage systems, and networking. On top of that has developed a Cambrian explosion of software — databases, streaming data infrastructure, applications, programming languages, query languages, APIs and microservices.

Objectively the world we are developing for in the year 2024 is more complex than ever before. At the same time organizations are demanding that their technology solutions appear simpler. Simpler to onboard and implement. Simpler to manage and afford. The only way you can make something objectively more complex seem simpler is through sophistication. Layers of abstraction and opinionated systems to mask underlying complexity and make good decisions for users, so they don’t get bogged down in setup and operations. Layers of great, well-thought-out User Interface and User Experience (UI/UX). Low-code or no-code services for easy adoption that don’t preclude power users from getting access to your CLI or API. And behind it all, fully-managed services teams of experts to take on administrative tasks and operations users do not feel are core responsibilities or value-added efforts.

In the early cloud, many tools and services were simple “lift and shift” deployments. The same on-premises software was simply put on a public cloud server. In 2024 that simply doesn’t pass muster. An open source project like Apache Pinot requires far more to make it cloud-native: it needs a series of services, tools and infrastructure around and on top of it to make for superior capabilities, performance and developer experience.

Industry Shifts

When Federal Express was founded in 1971 they had a revolutionary vision for next-day delivery that changed how business worked across the country and, soon enough, around the world. Today, 30% of consumers now expect same-day delivery. Amazon notes that same-day or next-day deliveries reached 60% for Prime orders placed in 60 major U.S. markets. Americans and global consumers are relying more than ever on delivery services. For example, between 2017 and 2022, CapitolOne Shopping notes that the number of packages received by the average American increased 73%.

For food delivery, the expectation is even higher — within-the-hour delivery — such as you get from DoorDash or JustEat Takeaway (known as GrubHub in the United States). Increasingly these services are branching out beyond restaurant-cooked food to provide general groceries, pharmacy, and convenience store products.

For Internet services, in the 1990s you might be old enough to remember waiting for 30 seconds (or more) just to connect to your favorite online service via dial-up modem. Today, always-connected mobile and web users will abandon sites and services that do not load within 2-3 seconds. Which often means that, for an end-to-end application, you can only grant the database itself a fraction of that time — often far less than a second — to process a complex query and return with the right answer.

This is the nature of the on-demand economy. Whether for B2B or B2C services, organizations need to be able to deliver based on the market’s expectations.

See also: It’s the Right Time to Start a Real-time Data Business

Integrating Real-Time Analytics into Your Data Architecture

Another factor in 2024 is that this is a world where everyone has existing infrastructure. Hardly anyone is a true “greenfield” deployment. Instead, what you need to do is to look at your infrastructure to see where historical systemic inefficiencies have crept in — and then decide how to obliterate them. Can you afford data updates only on a daily basis? How does that limit your business with your customers? In some use cases, data that is even a minute out-of-date can be too long-in-the-tooth for certain customer expectations. If so, then your batch processing architecture needs to be turned into real-time data streams.

In these cases, it’s not about replacing your data warehouse. It’s about offloading workloads from it that a data warehouse was never designed to efficiently or cost-effectively serve. Or maybe it is offloading work from a search engine or OLTP database that’s been platooned to stretch beyond its normal design and architecture to attempt to serve real-time analytics. This is a role these systems were never meant to serve in the first place. Like trying to use a hammer to pound in screws, when you use the wrong tools for the job, it won’t end with the results you want, and will require costly rework in the long run.

Whereas moving to purpose-built technologies like Apache Pinot for real-time analytics can meet your SLAs, eliminate operational inefficiencies, and save a tremendous amount of cost.

LinkedIn created the Pinot project with the use case of “Who viewed my profile?” Other existing systems simply didn’t work for the problem they had at the scale they wanted to operate at. Over time Apache Pinot was foundational infrastructure to build many other applications within LinkedIn — for recruiting, sales, marketing, and so on. LinkedIn also remains one of the key stakeholders of the Apache Pinot open source community, driving innovation and contributing back for the benefit of all users. Just as one example, they added theta sketches as well as related query enhancements, saved LinkedIn storage space by up to 88%.

Uber also became an early adopter of Apache Pinot. By moving to a designed-for purpose real-time analytics database, they finally achieved the performance they needed and saved $2 million annually in the process. Best of all, they didn’t sacrifice performance, they improved it: page load times dropped from 14 seconds to less than 5 seconds.

Cisco WebEx also found that moving to Apache Pinot provided subsecond latencies in most cases, whereas what they had been using was timing out at greater than 30 seconds two-thirds of the time. Moreover, they were able to reduce the amount of data storage from over 800 terabytes to 121 terabytes, saving them greatly on their infrastructure spend.

Stripe also uses Apache Pinot across many functions, both internal and external user-facing analytics. User-facing applications include customer dashboards, billing and developer analytics, to Sigma reports. Internally they also use it for failure alerts, financial data reporting, risk monitoring, and access log security monitoring. Stripe depends on Apache Pinot for its Black Friday / Cyber Monday business, and last year tracked $18.6 billion in commerce and over more than 300 million transactions in that critical period of annual business.

Real-time analytics has become foundational to some of the biggest brands in their respective industries. Once adopted, it doesn’t just stay in one niche of the business. It rapidly permeates across the organization as stakeholders see value and share their success internally. More teams, applications, and business units then seek to harness it for their own use cases.

The ease and speed of adoption of real-time analytics is driven by a rich ecosystem of integrations. Apache Pinot integrates directly with the broader data architecture you probably already have running right now. You can ingest data from your streaming data platforms, whether Apache Kafka, Confluent Cloud, Amazon Kinesis, Redpanda, or Apache Pulsar, or from stream processing engines like Apache Flink. You can also ingest directly from batch data sources such as data warehouses and transactional databases, Apache Spark, Hadoop or files in cloud object storage. Data engineers can even combine data from batch and real-time sources in Apache Pinot to create what is known as a “hybrid table.”

Real-time analytics are available now. It’s not 2014 anymore. It’s 2024. A decade of innovation has produced an Apache Pinot that is enterprise-ready and can even integrate directly with your Gen AI initiatives as a vector index datastore. Users no longer have to go through painful evolutions because the right technology does not exist to solve for this need. The time for real-time analytics is right now, today. In fact, I believe the real message you need to take into your organizations today is stay ahead or be left behind.  While the cycle of innovation and adoption is faster than ever before, I see project-after-project successfully making the migration to real-time analytics with rapid turnaround times — even as little as a single calendar quarter. It takes a combination of the right project and use case to get started with, the right technology to adopt, and the right partner to work with you on the project.

Kishore Gopalakrishna

About Kishore Gopalakrishna

Kishore Gopalakrishna is the co-founder and CEO of StarTree, a venture-backed startup focused on Apache Pinot - the open source real-time distributed OLAP engine that he and StarTree's founding team developed at LinkedIn and Uber. Kishore is passionate about solving hard problems in distributed systems and has authored various projects in the space such as Apache Helix, a cluster management framework for building distributed systems; Espresso, a distributed document store; and ThirdEye, a platform for anomaly detection and root cause analysis at LinkedIn.

Leave a Reply

Your email address will not be published. Required fields are marked *