With the accelerated movement of data to the cloud, the need to ensure data security as efficiently as possible has never been as important as it is now.
Where COVID-19 has turned the definition of “normal” upside down in our daily lives, it also hasn’t spared the way we work. Prior to the Coronavirus, migrating to the cloud represented one of the leading technology trends. Enterprises have been migrating their data and analytical workloads to the cloud to take advantage of its lower cost, higher flexibility, and the availability of advanced analytical tools. As a result, data science has emerged as one of the primary use cases for enterprises to migrate to the cloud.
The data infrastructure required to support data scientists and analysts in the cloud has also been evolving to reflect this change. Data science has been increasingly becoming a team sport where enterprises need a variety of personas to operationalize and derive value from their data science investments. It is, therefore, not practical to have an infrastructure where the predominant usage pattern is one of a single isolated user. An Infrastructure where each member of the data science team spins up their own cluster, leads to higher management burden, ineffective collaboration, suboptimal use of enterprises shared infrastructure, and associated higher costs.
See also: What Business Leaders Do Before Migration to the Cloud
Before and After the New Normal
Pre-pandemic, the cloud infrastructure model that was emerging to support the collaborative nature of data science was that of shared usage. Essentially, a data science environment or a cluster that can be used by a number of data scientists at the same time. These clusters are sometimes called high concurrency clusters. In this scenario, a scheduler assigns resources among users and optimizes the execution of the user’s jobs to run on the cluster. Before COVID hit, data science teams, especially in regulated industries such as banking, finance, and insurance, performed tasks in the same physical location. Data science teams working on a project would share their input data, model predictions, and features and build models collaboratively based on data or insights derived from others’ work using a shared cluster. In this way, it was conceivable for these shared clusters to have less than airtight security and access controls in place because they were only used by a small number of employees working within the boundaries of the company’s firewall.
That was then. In the “new normal,” data scientists and analysts like all of us have been working from their homes. They are now accessing their shared clusters remotely beyond the company firewall. Needless to say, the need to authenticate remote users and to have robust access control mechanisms in place has become paramount. First and foremost, companies need to ensure that the persons and the systems that are requesting access to their environments are actually who they claim to be. Companies have a number of protocols available to them in order to authenticate users, which includes LDAP, SAML, OAuth, and OpenID. Once the identity of the user is verified, companies need an access control platform that enables administrators to define and manage policies that control the access of the user to only the data that he or she is authorized to access.
An important consideration in this regard is for a single platform to administer access control policies for public clouds, their services, as well as third-party cloud-native services such as Databricks and Snowflake that companies subscribe to support their analytical workloads. The richer the functionality of the access control platform in administering access to finer grains of data, the easier it will be for infrastructure administrators to grant access to the precise slices of data that a user needs to do his or her job.
This pandemic has highlighted the need for organizations to have strong access controls in place. In order to continue their operations, companies need to onboard new users to their cloud environments and manage their entitlements with minimal overhead. This can be achieved with rich and robust policy controls that reduce the risk of inappropriate usage.
Compliance is Still King
As businesses struggle to operate in the COVID era, the need to comply with privacy regulations remains a priority. Administrators need to design workflows to seamlessly comply with regulations such as CCPA and GDPR for data stored in their cloud environments, as well as monitor and audit access patterns. The ability to monitor access patterns in real-time and generate audit reports for internal and external auditors takes on critical importance when users remotely access enterprise data. To stay abreast of complying with privacy and industry regulations, administrators should consider a platform that offers the functionality to automatically scan and profile sensitive data with high performance, fidelity, and at extreme scale.
To fulfill this requirement, a data access control platform needs to be versatile and leverage a number of techniques, including sophisticated rules, patterns, dictionaries, algorithms, and machine learning models to scan and classify data. The availability of pre-built reports and the ability to quickly generate custom as well as get alerts when sensitive data is accessed or moved are all platform capabilities that enable infrastructure teams to get instant visibility into data assets.
It’s clear that the business environment created by COVID has placed even more constraints on the expertise and already limited resources of the cloud infrastructure teams. To manage this increased demand, these platform teams now need to be more vigilant to manage and secure the data stored in the cloud. It’s no longer feasible to follow the de facto path of utilizing whatever individual data security tools provided by individual cloud vendors. At the same time, the amount of data being moved to the cloud and through the cloud has increased with COVID, along with its corresponding attack surface. In a digital economy driven by COVID-19, the risk of losing customer trust due to a security incident is higher than ever before.
When resources are scarce, and expertise is high demand, enterprises must consider the most effective options to eliminate the barriers to securing data in the cloud. Enterprises need to quickly and strategically consider the implications of moving the cloud to support the demands of a remote workforce. With the accelerated movement of data to the cloud, the need to ensure data security as efficiently as possible has never been as important as it is now.