StreamSets and Amazon Web Services have teamed to bring mainframe data into the cloud so that data can more easily be analyzed.
Data analysts are used to working at the speed of cloud using data in their SaaS applications. They hit a speed bump when they try to make use of enterprise data on mainframe systems centrally managed by IT.
RTInsights recently sat down with Mike Pickett, Vice President of Product Growth, at StreamSets, and Graham Hainbach, Principal GTM Data Migration & Modernization Specialist, WWSO D2M Team, at Amazon Web Services (AWS), to talk about the challenge of working with mainframe data for analytics and how the two companies are working to resolve the issue.
Here is a summary of our conversation.
RTInsights: Why is there so much interest in leveraging cloud services to aid in the reporting and analysis of data on legacy systems?
Pickett: One of the reasons cloud is so powerful is it’s always on and immediately on. There’s no infrastructure to manage. When you extrapolate that out and start looking at what business data analysts and data scientists are doing, they want to just get into their data and start working with it. What the cloud provides is an always-on, easy-to-use, easy-to-try platform for them to work with.
Contrast that with the prior era of on-premises data warehouses and Hadoop/Big Data environments. They would have to go through all sorts of justification and inspection in order to get the Central IT teams to provision infrastructure. They would often be forced to work with a set of tools that were selected for prior projects…regardless of how suitable they were for their specific needs or approach. This was suboptimal in many ways, especially time to insight. With the cloud, once they get their data, they can start working at their own pace and explore the data for insights. They can work as fast as ideas come into their head. Much greater control of their efforts and much better time to insight.
Hainbach: Another reason for the interest in cloud is time to market. As Mike outlined, we don’t have to go through the provisioning process. Being a legacy IT person myself, I remember if I was looking for additional hardware where I needed additional disk space that could take anywhere up to three months, to go through the provision, acquisition, purchase order, and negotiation processes. Once acquired, we’d have to actually provision that hardware in the data center. In total, all of this can be a very costly and time-consuming endeavor.
With the cloud, we now have the ability to turn those services on almost instantaneously. We can also control our spend in the cloud. We might have a research and development project. We can use a very small instance that only costs pennies per half hour of usage all the way up through multimillion-dollar complex deployments. That capability is something that a regular data center doesn’t have. The analytics folks who are looking at complex datasets need that capability, and they’re embracing and adopting it today.
Pickett: They can also try different tools. They can go to the AWS marketplace and try a tool. If it’s not working, not a big deal. It’s either free to try before you buy or consumption-based, pay-as-you-go pricing is very low. Also, they can easily match the cost of the tools and the overall project to the business outcome they’re shooting for or have achieved. That was difficult or impossible to really nail down in the on-prem era.
What we see going on is the practitioners rapidly realizing that tools can dramatically improve their efficiency and productivity. They are able to easily try tools and find the one that fits their needs the best or fits their desires the best. They can focus on delivering business outcomes rather than trying to manage and wrestle with technology. That’s good because business outcomes are what they want to work on. It’s how they help a company make more money and find new routes to market and things like that. How do they help them find efficiencies, save money, or spot risk and prevent loss slippage?
Hainbach: Additionally, one of the top concerns analysts and data scientists have always had is how do I get to my data? Up until about two or three years ago, 50% of the job for data analysts and data scientists was trying to identify the relevant data sets and get access to those data sets. Now that we’ve got tools that are available in AWS and partners, like StreamSets and its Mainframe Connector, we’ve completely simplified that process and made it secure.
RTInsights: What are the challenges in leveraging the cloud for this work?
Pickett: At StreamSets, we talk about unlocking data without ceding control. Fortune-class companies still have a lot of data in self-managed systems. This can be decades of historic transactions or when mainframes are involved, the day-to-day core business transactions. There are good reasons for it. The data might be held there for regulatory compliance, location requirements, or things like that. So, the data is managed by central IT.
Line of business analysts might have access to the data within their own SaaS applications, but when it comes to now adding some of the data that’s unique to their enterprise, that data is often residing on on-premises IT-managed systems. So, they have to work with Central IT to get access to this data.
The challenge for central IT teams is that they need to find tools that help them securely access and move the data into the cloud, into Amazon, efficiently. That is what StreamSets helps them do.
Hainbach: I’d also add that there are companies that have fully embraced the mainframe, continue to embrace the mainframe, have mainframe developers, and their core systems are going to stay on the mainframe. They’re not looking for the mainframe to go away.
That being said, there are sister divisions inside the enterprise that are looking for access to that data, and that’s where the cloud, Streamsets, and the Mainframe Connector create a synergy or the ability to move that data simply and securely across different divisions and into the cloud. We provide access to those data sets without consuming mainframe MIPS. Why is that important? The cloud is a lot cheaper and has the ability to scale versus running processes up on the mainframe with its very complex batch processes. That’s why it makes complete sense to integrate these two environments.
RTInsights: What’s needed to overcome the challenges?
Pickett: There are multiple aspects of this. Mainframe data is encoded differently, it’s shaped differently, and it can be very obtuse for a modern data or business analyst to understand. It’s well understood by the mainframe programmer and the mainframe operators, but those are the people that are immersed in the mainframe. There needs to be an ability for the people that want the data, the data consumers, to understand what’s on the mainframe and be able to explore it on their own. They would normally have to go through a process of requesting something and reviewing it. This routine is all too familiar for many data consumers. They get an extract file and are told it’s all that they need. But after doing some exploration and analysis, they get a hunch there are other datasets that are likely available but not in the file given to them. When they go back and explain what they’re looking for and why they think the file is not complete, they may encounter the “You didn’t ask exactly for that originally” reply.
A key part of StreamSets Mainframe Collector is a data dictionary for mainframe data. We provide a very intuitive, easy-to-use environment for the data consumers to explore the data sets that are on the mainframe and identify the data sets they want to move off. Once they’ve identified what they need moved over, they can then have a very fact-based information-rich conversation with the mainframe team along the lines of, “These are the data sets I want, and here’s how often I’d like to get it. Can we figure out what’s the best time to schedule jobs to move it off?”
This is a breakthrough in the market. It is in contrast to traditional methods, which require a bit of a leap of faith that people fully understand and appreciate what is being asked for by the consumers. In that case, you’re dealing with another human on the other side, the mainframe operator that has to make assumptions. So, until now, it was very difficult, if not impossible, for the data consumers to explore the data on the mainframe without fully moving it off the mainframe to some location where they could get a full appreciation, in their own minds, about what they want to get off and not be limited by what somebody was assuming they wanted to get off.
Hainbach: I’ll also add that the data on the mainframe is intellectual property belonging to enterprise customers. If we talk about VSAM structures, there are different storage mechanisms and storage silos that need to be brought together to create a bigger or better picture of the data that’s available. That’s what StreamSets and the catalog do for the data analysts.
Before, those conversations were really just conversations and what-ifs. The data analysts would get data dumps from the mainframe team, and then the data analysts would say, “Well, that’s not really what I wanted. Can I have this data set joined to a DB2 table joined to a VSAM file.”
Now we have that capability, and that capability is unique to the market. It basically opens up that data so that we can move it over to the cloud and explore it in a lot cheaper fashion.
RTInsights: You covered a bit of this, but how do StreamSets and AWS help?
Pickett: StreamSets provides the data dictionary for the mainframe data and the ability to easily move it off the mainframe. We provide a way to move it off continuously whenever the data is updated or just needs to be synchronized. We can move it off and move it into Amazon. Amazon provides the platform, tools, and services that data analysts and business analysts need who want to work with their data.
Hainbach: Once we have that data in S3 or in Redshift, we’ve got different tools that let you explore that data in different fashions and even apply AI and ML models to that data. And again, all of this is done in a very cost-effective fashion. We don’t consume large numbers of MIPS on the mainframe.
RTInsights: What are the benefits for joint customers?
Hainbach: Amazon has identified a need in the market as part of our analytics and database sales plays. We see a need for the StreamSets capability to bring the mainframe data into the cloud to then perform analytics. We’ve got partnerships with IBM, as well.
Some customers are looking to migrate away from the mainframe. Some of them are looking to integrate their mainframe into their existing multi-cloud environment. That capability is best served by a partner like StreamSets with its mainframe technologies. So, we’re very happy to put together a set of campaigns and joint customers where we can show success for these capabilities.
Pickett: As companies integrated and analyzed data that was in their SaaS applications, they got very used to a speed at which they could operate. What StreamSets and Amazon are doing together allows them to continue to operate at that speed. We’re breaking through the barriers and eliminating the obstacles that have traditionally held teams up.
That means companies are able to include more of their strategic enterprise data in their planning predictions. And it allows them to make better decisions more confidently. So, they have a greater opportunity to control and guide the business, exploit market opportunities, and be more profitable and competitive.
Additional Resources:
Data on Demand: Sharing in the Legacy Information Feast (Webinar)