Mike Pickett, Vice President of Growth for StreamSets (now part of Software AG), talks about mainframe data access for reporting and analytics.
As businesses digitally transform and seek to be data-driven, they need fast access to relevant data for reporting and analytics. For many businesses, much of the essential data is on their mainframes. That represents a problem in that access is tightly controlled and requires the help of already busy and guarded mainframe team members.
RTInsights recently sat down with Mike Pickett, the Vice President of Growth for StreamSets (now part of Software AG), to talk about mainframe data access for reporting and analytics. We discussed the challenges businesses face, how StreamSets can help, and the benefits businesses can realize once mainframe data access is simplified.
Here is a summary of our conversation.
RTInsights: What have been the barriers to providing mainframe data for reporting and analytics?
StreamSets: The barriers can be grouped into three categories.
First are technical barriers. The data is shaped and formatted differently than on other compute systems, where today’s analysts tend to work in the cloud. It starts with encoding. Mainframe data is encoded in EBCDIC, while other systems use ASCII. So, the data needs to be translated before it can be used in any systems running Unix, Windows, or Macintosh.
Then there is the challenge of how the data is structured. Mainframes are running custom applications where the data is shaped to fit the application. The data often resides in Db2, VSAM, or QSAM files or in non-relational databases like IMS DB or Adabas.
The second issue involves processes or people. Those managing and supporting mainframe applications are often a small group of people with limited time. They have different skills. And because of the critical nature of mainframes, they also have very protective mindsets. They need to ensure that the mainframe remains extraordinarily secure. They need to make sure that nothing interrupts operational uptime or performance. In contrast, the nature of reporting and analytics projects is that they are very iterative.
Trying to match these two teams together is a challenge. One (the mainframe group) is short on time and protective. The other (the analytics and reporting team) is looking for quick access to expanded data sets.
And then there is cost. The way mainframes are commonly charged by mainframe vendors and by software suppliers is if there is additional consumption on the mainframe, clients are charged higher fees for virtually all the components of the mainframe. That includes the hardware and software components.
Where does that come in? Pre-processing and transforming the data so that it is consumable in open systems (for reporting and analytics) can drive up usage. Any job that increases the workload of the mainframe can have cascading cost implications.
So, when you add those three things up, mainframes while supporting mission-critical business operations, often become the last system those working on reporting and analytics efforts are able to access.
See also: Unleashing the Value of Mainframe Data for Use in Modern Data Analytics and Reporting Pipelines
RTInsights: How does StreamSets address and eliminate these issues?
StreamSets: Our solution, by design, is oriented towards improving the collaboration between the teams that need the data and the teams that have the data, the mainframe experts.
Unique to our offering is a data dictionary, which securely presents or represents the data available on the mainframe. So, users who need to find data are able to preview what’s available to them on the mainframe before they ask the mainframe experts to extract it and move it over into their systems.
We provide an infrastructure that allows each team to focus on their core efforts. They then can communicate and collaborate on how to get the data off and when to get the data off after it has been clearly identified what data they want to get off the mainframe.
The way this works is by using the StreamSets Mainframe Collector, which is installed on a Windows server with reasonable proximity to the mainframe. The server is used to perform query translation (SQL to the access method of the data source) and data translation (EBCDIC to ASCII) so that mainframe impact is minimized.
More importantly, it holds a data dictionary of the data assets available on the mainframe. The data dictionary adapts and extends the mainframe security framework so sensitive data remains secure, but it also presents the data assets in virtualized, relational views.
That is important because it gives those who need to use the data the ability to explore the data assets in a format and with the tools they are familiar with before they ask for the data to be systematically moved off the mainframe into their cloud environment. Our tagline for this is “Explore before you do more.”
That is a huge productivity boost for all involved in the data supply chain. Without this, it’s like being in a restaurant that doesn’t have a menu. You know there is good food there, but that back and forth between the diner, the waiter, and the chef is a frustrating, slow process with more than a few wrong results. Our solution enables mainframe teams to create a menu of what’s available, teams can explore and sample data sets, and then set up operational pipelines once they have determined they’re going to get exactly what they need.
With regard to the pipelines, we’re able to set up pipelines that move the data continuously as it is updated on the mainframe. We call that set it and forget it. So, the analysts that are trying to consume the data know that whenever they’re working in their environment of choice, say it’s Snowflake or Databricks, it is reliable and up to date. They don’t need to go ask anybody if they can do another poll to give them the most recent information.
Another way to look at it is that our focus is on team productivity and efficiency versus high performance and low-latency data replication. The cost of that latter approach puts mainframe data out of reach for broader enterprise reporting and analytics projects. Our approach eliminates the issue.
RTInsights: What changes in organizations once such access is simplified?
StreamSets: The changes can be quite significant. In today’s data-driven world, the way successful projects work is to bring new and additional data to existing processes and projects rather than trying to create an entirely new project.
So, by opening access to the data and liberating the mainframe data so that users can understand what’s there, they’re able to focus on their data initiatives and operate much faster.
The data initiatives that many businesses are doing look across their corporate data assets to identify new ways to grow their business. Oftentimes that can identify opportunities to cross-sell or upsell new products to customers.
Access to the data can help them look for ways to optimize and improve operations. That can be things like optimizing the supply chain or inventory. They can look for ways to improve compliance and regulatory reporting. And for risk and fraud, access to the data can help them spot anomalies in their business transactions that could indicate that fraud or waste might be occurring in their business.
Additionally, people have more holistic visibility into their operations, which means they’re able to make better, more informed decisions. They are able to find more ways to optimize their operations or to spot new sales and marketing opportunities. Where the data is relevant to regulatory reporting, this can mean reducing corporate risk and improving governance. It’s good for everyone.
RTInsights: Does simplified access expand the use of that data to new analytics and reporting application areas?
StreamSets: In all cases where a company is able to include more of their enterprise in their business decisions, they can make better decisions with greater accuracy. They can make decisions with greater confidence, and they can make decisions faster. That’s what successful companies are striving for. They want to improve how they’re using data to be more competitive, more efficient, or reduce exposure to risk. That is not possible without simplified access to mainframe data.
We are hearing from data and analytics teams, and even some mainframe teams, that unless it can be 100% justified, the data on the mainframe isn’t available for use. It’s like a self-fulfilling prophecy. You can’t have it unless you can tell me it’s absolutely critical, but I can’t tell you it’s critical unless I can first see and understand what’s there. We remove that barrier enabling expanded use of mainframe data for reporting and analytics.
There’s another scenario that simplified access addresses. It has to do with the frequency of the data supply.
Historically, enterprise reporting, especially with public companies, was done on a quarterly basis. More and more companies realize that if they’re able to track and measure their business on a tighter frequency, say monthly or even weekly, they’re better able to spot trends in their business much sooner and faster. That gives them an opportunity to take advantage of opportunities or respond to challenges much quicker, ensuring better business continuity and business operations.
We have talked to a data analytics team that is only able to get a file once a month. They’re looking for ways to optimize their business, but they only get one shot a month. They always have update requests and change requests, but they can only do those twelve times a year. If they miss their monthly window, they wait till next month.
The mainframe team feels that should be adequate, but they also don’t have the bandwidth to support all the change requests. Ideally, the data users are looking for weekly to daily results, but they’re stuck with monthly. The easy access we provide allows more frequent access without burdening the mainframe team.
RTInsights: What are some use cases that you see in the market for your solution?
StreamSets: The big one is improving access, visibility, and insights for analytics teams. These are the people who are constantly iterating and adding new data sets to their projects. It can be operational reporting where, once teams feel they have clarity and control of one aspect of their business, they want to dig in and expand what they’re looking at.
Regulatory reporting is another use case. Regulatory reporting is far from static or slow moving. The rules are constantly changing. Demand for greater transparency is growing.
Additionally, there are some companies in different industries that are good examples of the use of our solution.
A manufacturing company with an inventory management system on its mainframe is looking for ways to optimize its suppliers, optimize its inventory, and decrease the amount of time that it has to sit on inventory before it is consumed in its systems. The needed data is on the mainframe.
The mainframe team wants to be able to do this on its own. The business analyst that is working on their corporate optimization is in a line of business, and they are continually looking to get more data from a wider variety of systems and sources across the mainframe to optimize the business. And they want to be able to do this at a faster, iterative rate.
Another company’s business is data services. Today, their analytics team is looking to spot trends in customer purchases. They’re able to get a data report once a month that is supposed to help them identify areas of optimization and new sales opportunities. They are able to inspect and analyze that data only 12 times a year. They’re looking to decrease their access time down to weeks, if not days, versus the 30-day window that they’re in today.