New IBM tools advance the state of DataOps by automating manual tasks that would have to be performed by a data engineer.
IBM today extended its Cloud Pak for Data portfolio to include an instance specifically designed to automate DataOps processes associated with building advanced analytics applications infused with artificial intelligence (AI).
This offering builds on an initiative IBM launched last year to create Cloud Paks that make it simpler for IT teams to consume middleware technologies that have been packaged as a set of Docker containers that are easier to deploy, says Jay Limburn, a distinguished engineer and director of offering management for IBM Data & AI.
See also: Deloitte Report Details Scope of Data Modernization Challenge
Cloud Paks are essentially tools and applications that IBM bundles on top of microservices that provide the equivalent of an application server. However, those microservices are based on containers that Cloud Pak can run on any public cloud or on-premises IT environment.
“It’s a full stack platform,” says Limburn.
The latest editions to the Cloud Paks series include:
StoredIQ InstaScan, which is a new unstructured data management and privacy tool that identifies risk hot spots in data sources and prioritizes potential fixes and remediations. That capability reduces the time and effort associated with meeting compliance requirements while at the same time enabling organizations to conduct periodic risk assessment tests. Organizations can also employ StoredIQ InstaScan to define policies that ensure data sources are accurate.
InfoSphere DataStage, an extract, transfer and load (ETL) tool, has been updated to include a Change Data Capture feature that continuously captures data changes and automatically transform it. The ETL tool can also identify assets from a data catalog, and then automatically generate jobs based on the attributes of those assets. A collaboration feature that makes it easier for business users and data engineers to share data and insights has also been added.
An update to Watson Knowledge Catalog (WKC), which adds data quality and governance capabilities for policy enforcement. WKC provides access to a variety of third-party data, such as socio-economic data, that can be combined with enterprise data in a single enterprise catalog.
Collectively, these tools advance the state of DataOps by automating what otherwise would be manual tasks that would have to be performed by a data engineer, says Limburn. To make matters more challenging, as AI models continue to be introduced into enterprise applications the amount of data that needs to be managed is also increasing at exponential rates. By automating what IBM now refers to as the first rung of the AI ladder to create and manage data pipelines, organizations will be able to derive more value from investments in data scientists faster, adds Limburn.
Of course, it’s not just AI that is increasing the pressure on Data Ops. As the number of applications being built continues to accelerate, DataOps teams need only need to create more data pipelines than ever they also need to ensure the data being accessed doesn’t run afoul of any number of compliance regulations. In fact, without increase reliance on automation there is simply no way DataOps teams will be able to keep pace with the rate DevOps teams are now building and deploying applications.