An essential element in making business decisions today is how to make transactional and operational mainframe data more easily available for reporting and analytics and in ways that do not consume great resources.
Businesses today operate at blazing speed. They can no longer make decisions based on last week’s data. As such, getting access to data for reporting and analysis is critical. Certainly, there are technologies that can help. But frequently the essential element is how to make transactional and operational data on mainframes more easily available and in ways that do not consume great resources.
Recently RTInsights sat down with Mike Pickett, Vice President of Growth for StreamSets, a Software AG Company; Daniel Bierman, Principal Product Manager for CONNX at Software AG; and Nicole Ritchie, Director of Product Marketing at Software AG; to talk about these issues and more.
We discussed why there is such a great need today for access to mainframe data, how traditional methods of proving this access sometimes fall short, how StreamSets can help, and the benefits that can be realized once such access is available.
Here is a summary of our conversation.
RTInsights: Why is there such great interest in getting access to mainframe data, and especially why now?
Mike: Mainframes have always been at the core of business transactions. They’re extremely reliable, transact at a very high rate, and are very secure. What we’ve seen over the last decade is transformation and a need for business analytics. That was all driven by the cloud.
You have these cloud analytics platforms, Snowflake, Databricks, Amazon, Google, and Microsoft that lines of business analysts and data scientists have been able to use to get access to the data in their business. They are traditional SaaS business systems, and they run all sorts of amazing analytics used to provide much better insight and find ways to optimize the business.
See also: Activate your Mainframe Data for Cloud Analytics
Now what we see happening is this next level of analytics is moving deeper into the enterprise, where they want to start including data from their traditional enterprise systems. That includes systems that are better on-premises. And when you get down to the core of what’s powering a lot of businesses, it’s the mainframes, and it includes an incredible wealth of operational data and historical data that, when used in modern analytics, can help companies go to the next level on finding ways to optimize their business, finding ways to spot fraud, identify trends, improve customer interactions, and more.
Nicole: I could add that, just to echo what Mike says, mainframes are not going away. In a recent survey, 90% percent of IT and business executives view their mainframe as a growth platform. They hold a wealth of information; a lot of historical information about customers, transactions, and business operations. It’s invaluable data to the business analysts who are looking at how they can improve their customer relationships and operations, as well as the products and services they offer.
As such, I believe that the inclusion of mainframe data in cloud analytics is important to operate efficiently in this hybrid world.
Mike: Nicole mentioned something there that I want to touch on. The way analytics is done today is very iterative. You start with a data set or a set of data points, and data scientists and analysts will work through that, come up with their findings, and then augment it by finding more data and by adding data from more sources in the enterprise. It’s an ongoing, never-ending process.
And this is why we’re just now at the point where the analysts are looking at more and more of their
critical data. And we believe there isn’t just one end of the mainframe data. These can be quite
extensive; you don’t just replicate the data off the mainframe and say, “there it’s done.” It needs to be done in a very controlled way. But in such a way that empowers people to have easy access to
understand what data is in the mainframe, and allows them to access what’s needed while not
overloading or overwhelming the mainframe teams or the mainframe systems.
RTInsights: How has this traditionally been done, and what are some of the challenges that the past approaches are having?
Daniel: About 20-odd years ago with mainframes, you either bolted in reporting to the user, and that ended up not being sufficient. So, the user would ask for additional information, resulting in a
programmer having to write a piece of code, generate a report, and pass it on to the user. And then, as Mike mentioned, it is processed. So now it comes back, and you end up with these people struggling to keep up.
What is the next solution? Enter the data warehouse era, which some or a lot of that started on the
mainframe. The idea was “let’s just move the data out, transform it, and put it in a format that is
reusable for the user so they can do their own thing”. That way, they’re not going to ask us every other day to change the program to extract the data. So, that was the next generation.
We’ve now moved past that because we get into a more powerful world, as far as computing and
computing capabilities are concerned. We now get into the analytic side, which is, again, a sort of
process where you look at what you have, analyze it, find out what’s missing, and determine what you need. You can now go and do that. It’s a convergence of things that used to be hard to do that we now need to do quickly and efficiently because the ability to process the information has improved.
One of the things that we saw in this middle phase was a time lag because the data warehouse would be loaded once a week. So, you would have to be happy with what you got last week and not what happened yesterday or what’s happening today. That data is just so hard to get to, and it’s not that it couldn’t be done; it was just an effort. So, a lot of the time, they said, “Well, we’ll just carry on without that information.”
Mike: I would say the number one approach was to do nothing. They’d look at how mainframe data
could be used. If it was for critical, must-have operational applications, companies would say, “Fine,
we’ll spend the money; we’ll implement a very specific set of high-end, costly tools that are designed to ensure that that data can be replicated over to those other operational systems.” But that meant that the data is exclusive to those applications or systems. It’s not really going to be made available for general analytics. It’s just too expensive and too hard to access.
So, those requests just went unaddressed, and people got used to it. Many times, we have encountered customers that would say, “Well, we have it, but we’re not going to get data off the mainframe just because it’s too hard, too expensive.” And so, people shied away from it.
What we’re opening up here is the potential for a new generation of analytics that includes core
operational data that just simply wasn’t available before. That’s the data that was too hard and too
expensive to get at. Now we can do that.
Nicole: In addition to that, I would add that one of the beauties of the new cloud data analytics
platforms is that they are providing a level of power of computing and scale that traditional data analytics platforms never had before. On the analytics side, they are able to handle the larger data sets that are often held in the mainframe. So, there’s getting the mainframe data, which we’ve discussed a lot of those hurdles, but now on the analytics side, you are able to handle that kind of volume.
Mike: If you look at it, mainframes have always been known as the big iron and able to do incredibly
high volumes of transactions securely and accurately. There wasn’t any open system that could keep up with it, and there still isn’t. With the cloud as an analytics platform, it is the equivalent of unlimited available compute. And because of the way these cloud platforms are designed, they can scale out. So, now we’re seeing the equivalent of the mainframe for analytics platforms here.
RTInsights: How does StreamSets help?
Daniel: First of all, we enable real-time access. I don’t have to go and ask somebody to do something. With our Mainframe Data Collector, we make it possible for the data workers to see the data available to them, and they can then access that data. There’s no intermediary, “Oh, let me ask for a programmer to write an extract to put it in a file, so I can FTP it, import it, and process it.” I can now directly get to the data and immediately make it part of a StreamSets pipeline.
Mike: A key nuance here is that other alternative technologies are all about, “I’ll move the data and put it somewhere else for someone to then look at it.” We give the users the ability to see what data is available, and we’re not moving it until they request it to be moved.
Daniel: So, immediately, it is in the StreamSets space, and from there, we have very specific StreamSets capabilities that deal with exposure for the data worker and in the delivery.
Mike: So, the Mainframe Data Collector lets people see what data is available on the mainframe that they can include.
For the data that’s selected by them to be included, StreamSets is able to, and this is what we do really well, create an intelligent pipeline. There are over 50 processes that a data engineer can apply in the pipeline that helps shape it. It is a bit of transformation, but it’s not the traditional transformation that people consider in ETL, where there are heavy changes being made.
We’re shaping the data to conform to a standard format so that when it arrives at the destination, say Snowflake, Amazon, a data lake, it’s available for analytics right away. And that really helps the analysts or the data scientists with assurances knowing they can start using the data right away and start including it in their models or reports.
It’s just ready to go. They don’t have to go through extensive curating processes that can take time and add complexity to what they’re doing. The other thing is, should they decide, at the origin, to include additional data or new types of data or to change data formats, the StreamSets pipelines can detect the changes and continue to ingest the full data set into the destination.
And then, at that point, we notify the user that “Hey, there’s additional data available to you. It’s your choice as to how and what you want to include”, all while the pipelines continue to operate and ingest. That frees up the whole data supply chain. People don’t get hung up trying to fix, find, or say things like, “Hey, my reports are not working anymore. What’s going on? I don’t have data that’s been in there for a week.” And somebody finds out, well, the pipeline broke last Friday because something changed. We ensure the continuity of the pipeline regardless of changes that happened.
RTInsights: What are the benefits in general of this approach?
Daniel: The way our Mainframe Data Collector is implemented is minimally invasive. Depending on the data, there’s very little that needs to be installed on the mainframe. It is not a huge consumer, and it’s no more than writing a program to go and read the data, except now you can read it at any given time. And it deals with all the intricacies of the mainframe, like data formats. We deal with all of that.
So, what’s the benefit of this approach? First of all, getting that data out into the StreamSets pipeline is minimally invasive. We’re not breaking any of the security rules that exist in the mainframe environment. And we’re constantly delivering the data to the pipeline.
Mike: There are benefits for central IT, business analysts, data scientists, and more.
If you look at the middle phase, where the data’s moving through central IT to the cloud data platforms, typically central IT is the team responsible for ensuring the pipelines are running. Because of the nature of the resiliency of the StreamSets pipelines, it continues to run. That work is offloaded from them. They can spend their time working on new requests rather than doing triage and trying to fix where pipelines might have broken and data’s not being delivered.
For the business analysts and the data scientists, they are now able to work with a much better view of their business operations. They’re able to work with larger data sets that can give them new insights to improve business operations. For example, in manufacturing, this can be things like how to optimize their supply chain or how to optimize their manufacturing systems and equipment.
For reporting for business analysts, consider that if the business analyst is doing reporting on
governance risk and compliance, things in financial services, these governance risk and compliance
regulations are global requirements. You likely have regulations that are different across pretty much every country. And, they’re changing all the time. And with this type of solution, they’re able to quickly see what data is available to them that may not have been necessary in previous reporting, but now is becoming necessary.
With StreamSets Mainframe Collector, they can quickly and easily include this new data in their reports. So, they’re able to get on with their work meeting regulatory reporting and compliance deadlines with fewer problems and more confidence that what they’re reporting is accurate.
For teams looking at customer interactions, they can have a better, more complete view of what their customer’s buying trends have been. This helps them spot trends and think of ways that they can cross-sell or upsell new products.
RTInsights: Can you discuss some common use cases and successes?
Mike: I talked about the regular reporting, but this delves deeper into not just external reporting. It can involve internal reporting as well. Analysts, especially in large companies, need to report stats, progress, and things like that. A lot of this data is coming from operational systems, and a lot of the operational systems or operational transactions are happening on the mainframe. So, one use case is to support operational reporting.
Another use case involves analysis. For example, a company might want to explore selling into new
markets. They would need to understand what their current success has been. To do that, there’s a
mixture of data that they would need. One would be external data about the market, market
opportunity, and characteristics of that market.
For manufacturing, mainframes run a large part of the manufacturing lines and the supply chain. One common scenario of importance now in manufacturing is how to improve and optimize the supply chain. We are hearing from companies, many companies, in many industries that COVID broke the supply chain. They have to reanalyze their supply chain, reanalyze their suppliers, set new expectations, and more. The data that’s needed is on the mainframe. So, they need a way to find the data to do this research into their supply chains.
Daniel: Supply chains and manufacturing have been using the technology for a long time. Now, I’ve seen many cases with IoT and our ability to gather information about what’s happening on a plant floor where they are producing materials or machines. A lot of the time, that data feeds into these mainframe systems, including things like when a part was ordered, when an engineer was sent out, and what the frequency was.
Mike: Fraud is another example. There are two types. There is much attention these days to catching fraud in real time. That’s not necessarily what we’d be looking at. The other type of fraud is increasingly done over a very long period of time. You need to get to the transactions and look for anomalies over years. This is increasingly what companies are starting to do. And a huge amount of that data can be stored on the mainframe.
Nicole: There is also customer 360. Transactional data is what the mainframe is rich in having. So when a customer calls the customer care center, if they have that central resource that can capture who the customer has been talking to and about what, what their most recent orders are, and what their historic transactions are, they’re better able to provide a 360 view of that customer relationship with the business and provide well-rounded support that really meets the customer’s needs.
Mike: A lot of initiatives today are providing customers with self-service access to their buying history, to their support calls, and more. And increasingly, cloud data platforms like Snowflake are the way that companies store that information and then make that available through their self-service portals.
Our solution can help companies make information available in a very easy and intuitive way. A
company could provide customers with information about their interactions. The issue is how do we get that into a common location, say Snowflake, and then make it available to them? Mainframe
transactions and interactions are going to be one of the things that they can include.
Nicole: And I’ll add onto that, with StreamSets and the StreamSets Mainframe Data Collector, the
beauty of the solution is that it allows the business analysts and the data engineers to treat the
mainframe as if it was any other relational data source. The data is in a format that they understand and that they can use for their analytics. It really simplifies everything from the endpoint of the analytics person.
Mike: I think that emphasizes and shows again, that data and analytics projects are iterative and cyclical. They just keep going deeper and deeper and deeper. But in order to do that, the data has to be accessible and intuitive. Accessible meaning easy and intuitive but not overly expensive. And the other thing is that governance and security need to be part of the equation. This is something that we bring together as a solution.
Want to learn more? Visit StreamSets’ blog and read, “Mainframe Data Is Critical for Cloud Analytics Success—But Getting to It Isn’t Easy.“