Webinar: Survey finds that the increased interest in generative AI and predictive AI, as well as the need to support traditional analytical workloads, is leading to a massive increase of data sprawl across industries.
Just about every business is turning to artificial intelligence and advanced analytics to elevate their competitiveness and, accordingly, are rethinking their data and cloud strategies. This realignment was the subject of a recent webinar conducted by RTInsights in partnership with Ocient, which specializes in helping companies leverage large, complex workloads.
The challenge of containing and leveraging massively proliferating data “makes things expensive and complicated as enterprises tool up for their AI journeys,” said Shantan Kethireddy, co-founder and vice president of customer solutions at Ocient. In the webinar, he shared the results of Ocient’s latest survey of 500 IT and data leaders, which produced five key findings:
1) AI investments – and challenges – are creating data sprawl
2) An increased focus on data speed, security, and sustainable energy
3) Leaders can’t accurately anticipate analytics costs
4) Leaders are rethinking their cloud-only data and analytics infrastructures
5) Energy consumption and availability are reshaping large-scale data analytics
“All of these topics are interwoven,” said Kethireddy. “They point at sprawl and duplication of data across systems that are really growing over time.”
Many of today’s organizations, he pointed out, lack “proper data glossaries, or an understanding of the data lineage as well as proper oversight of how individual workloads for the various business units are driving monthly costs.”
With increased interest in generative AI and predictive AI, as well as supporting traditional analytical workloads, “we’re seeing a pretty massive increase of data sprawl across industries,” he observed. “They track with the realization among many of our customers that they’ve created a lot of different versions of the truth and silos of data which have different systems, both on-prem and in the cloud.”
Among the many customers Kethireddy works with, “100% say their data is growing. I hear about the need for consolidation to execute increasingly challenging priorities with data. They need to make sense of the lineage of the data, reduce the latency or data staleness, and reduce the management of systems that the most important people in their technical teams are spending a disproportionate amount of their time doing.”
The survey also finds a retrenchment from the cloud for handling AI or analytical workloads. “Not having chargeback models while also allowing self-service utilization, as well as this unforeseen data growth that we just talked about, has really unexpectedly increased cloud costs,” Kethireddy said. “The cloud has driven tremendous innovation, but as far as scaling businesses in adtech or telcos’ most compute-intensive workloads, on-premises is still the most viable option. Leaders have learned this the hard way.”
The escalating costs of cloud and applications often come as a surprise to both business and IT executives, with more than two-thirds of the survey respondents, 68%, incurring unexpected analytics spend. Sixty-four percent saw cloud costs go up more than planned, and 57% said systems integration costs were higher than expected. Another 54% indicated they were subjected to unanticipated data movement costs. “Everyone is keen on performance and scale, but for customers with very large-scale compute-intensive analytics needs, everything comes back to costs,” Kethireddy said.
The unexpected costs associated with these initiatives “is a major challenge in the cloud where you have compute metering, with costs each month,” he said. Adding to the element of surprise costs is executives and managers are often blindsided by unexpected systems and data expenses. “Oftentimes business units within an enterprise are only exposed to their systems, or their VMs, or their databases without any real observability or management or governance around utilization,” he explained. Typically, infrastructure teams pursue a reactive strategy, posing questions such as “Who ran that gargantuan test workload last month?” without the ability to predict demand.
Before companies can successfully leverage AI and advanced analytics, it’s urgent to address the “runaway data movement and data pipeline challenges that are so common in enterprises,” he pointed out. “When you think about data movement and data pipelines, most customers have transactional systems or legacy environments that then feed data to downstream systems. Or they’re getting a firehose of data from a variety of sources that are coming from the cloud, and they can be batch or streaming data.”
What happens is these organizations “take that data and transform or consume it by multiple business units using their own extract, transform, and load (ETL) solutions,” he illustrated. “They can be completely different types of data. This is typically the first kind of deviation or loss of a unified source of truth for the data.” The ETL solutions that each group manages “have their own user acceptance testing or production environments, which means more copies of data,” he pointed out. “Then that data is fed to multiple systems, maybe for dashboarding or for more low-latency analytics. But it’s also fed to their systems, like OLAP systems or data lakes.”
If a data team “can’t get the data where it needs to go, they’re not going to be able to analyze it in an efficient, secure way,” he said. “Leaders have to think about scale in new ways. There are so many systems downstream that consume data. Scaling these environments as the data is growing in many cases by almost double-digit percentages year over year is becoming unwieldy.”
A proactive approach is to address these costs and silos through streamlining and simplification on a single common platform, Kethireddy urged, noting Ocient’s approach to “take the path to reducing the amount of hardware and cloud instances it takes to analyze compute-intensive workloads. We focus on minimizing costs associated with the system footprint and energy consumption.”
In addition, an optimal approach to pricing is by “the number of CPU cores or nodes, rather than the amount of compute consumed,” which is the standard practice across cloud infrastructure and application providers across the industry, he explained. “As workloads get more complex and CFOs need some predictable pricing to budget, you’ll see more leaders looking for such solutions.”