The framework can help organizations accelerate the move to DataOps by providing reference architectures that simplify adoption.
WekaIO has launched a framework made up of customizable reference architectures (RAs) and software development kits (SDKs) that make it simpler to build DataOps pipelines on top of a Weka File System (FS) that can be deployed on top of multiple third-party platforms.
Weka FS is optimized specifically for platforms based on NVMe architecture that are often employed to drive advanced analytics applications in real time. The framework was developed in collaboration with partners such as NVIDIA, Mellanox, and other vendors participating in the Weka Innovation Network (WIN), says Shailesh Manjrekar, head of artificial intelligence (AI) and strategic alliances for WekaIO.
See also: Why Data Science Needs DataOps
The Weak FS is usually deployed on IT infrastructure that organizations acquire from those partners, notes Manjrekar. It delivers more than 73 GB/sec of bandwidth to a single GPU client along with built-in tools to address versioning, explainability, reproducibility governance, compliance, in-line encryption, and data protection.
One of the major use cases for Weka FS is proving the throughput required to AI applications, says Manjrekar. AI data pipelines differ from most existing applications in that they require a massive amount of bandwidth to ingest data. There are also unique challenges stemming from mixed read/write handling for extract, transform, load (ETL) tasks, the ultra-low latency requirements of AI inference engines that run AI models, and the need for a single namespace for the entire data pipeline.
Weka FS is built on top of a Kubernetes orchestration engine, which Manjrekar says serves to make the file system highly portable across platforms inside and out of the cloud. Weka FS can also be deployed on top of any object-based storage system, notes Manjrekar.
With more organizations aggregating massive amounts of data to drive AI application development, IT organizations are embracing best DataOps practices to make provisioning and accessing data more agile. The goal is to not only increase performance but also make large amounts of data available as quickly as possible. Most AI applications are only going to be as good as data accessed, so interest in modernizing storage systems is on the rise.
“Data is the new source code,” says Manjrekar.
Naturally, embracing DataOps requires not just access to modern IT platforms but also major changes to the culture of most internal IT organizations. The frameworks provided by WekaIO are intended to help organizations accelerate that transition by providing reference architecture that simplify adoption and SDKs that serve to make Weka FS more extensible.
Naturally, WekaIO is not the only vendor focusing on DataOps these days. Competition to provide the platforms on which next-generation AI applications will be trained and deployed is already fierce. The challenge many IT organizations will face now is finding the funding needed to acquire those platforms during an economic downturn brought on by the COVID-19 pandemic. The paradox, however, is that in most cases the only way most organizations will be able to survive and thrive during the downturn is to rely more on AI to automate processes at a time when the number of employees available to manage a process has most likely been sharply reduced.