Running Spark on the mainframe can be advantageous because data is co-located. One use is fraud detection.
Think of the mainframe, and legacy systems come to mind – crunching numbers and storing transaction records by the score. It’s not necessarily on the forefront of cutting-edge digital businesses.
IBM has been working on changing that perception in recent months, transforming its big iron z Systems computer into a real-time analytics machine. The vehicle that is making this possible is Apache Spark, the in-memory computing engine for real-time data analytics.
IBM z/OS Platform for Apache Spark enables Spark, an open-source analytics framework, to run natively on the z/OS mainframe operating system. The offering enables enterprise users to analyze data in place on the system origin, without the need to extract, transform and load (ETL), by breaking the tie between the analytics library and underlying file system.
Apache Spark, often referred to as the “analytics operating system,” may have only been visibly on the scene for a couple of years, but it is already making its mark. According to a recent survey of 7,000 executives and professionals by Taneja Group, nearly one-half of all respondents, 54 percent, are already actively using Spark. Aside from the expected data processing/engineering/ETL workloads which make up 55 percent of reported Spark use today, the top active Spark initiatives include real-time stream processing, exploratory data science, and the emergence of Spark for machine learning.
Z Systems and Apache Spark
For many system environments, Spark was looked upon for its speeds and feeds, says Mythili Venkatakrishnan, senior technical staff member at IBM. However, analytics on the mainframe brings operational advantages well beyond mere speed. “The true value is co-location of the data,” she explains. “We can natively access all those system-of-record data sources – DB2, IMS, and all the stuff that clients have on their mainframe today.”
Spark performs analysis right on the mainframe, says Venkatakrishnan. Business users can now perform advanced analytics directly on z/OS data without that data having to be first moved or “prepared” through ETL, she adds. “When analysis requires insights from other sources, the analysis is simply federated by Spark, as opposed to data being moved. We don’t have to wait until all that moved to Hadoop for a series of ETL jobs that’s going to take quite a bit of time, and costs more.”
While many Spark on mainframe projects are still in proof-of-concept stages, there are a range of potential applications. Typical Spark-enhanced applications being deployed on the mainframe include fraud pattern detection, Venkatakrishnan says. “They’re able to find those patterns much more quickly with real-time data.” Another Spark-on-mainframe use case, she continues, “is a real-time view of payments or claims that are going through the system. What’s coming in, how those payments are moving through the various applications, and what their status is.”
Targeted marketing is also an additional application now seen on mainframes. “So identifying opportunities for large clusters to be able to better serve their clients through targeted offers, analyzing data that they own, as well as data they may not have within their organization today, in order to have a more effective to cross sell upsell,” she says. “We also have clients focused in IT analytics, or operational analytics, looking at a real-time view that’s coming from the system itself, for purposes such as capacity understanding, looking at real-time feeds of when certain thresholds are hit.”