There’s gold to find in the big data forest, but most companies have no map and no crew.
A new research report from TDWI, titled Data Science and Big Data Enterprise Paths to Success, outlines the state of big data and data science: In short, it’s getting bigger and more difficult. On a scale from 1 to 5, with 5 meaning “completely satisfied” with the current data management strategy, only 3 percent of respondents gave a 5 answer.
Roughly 43 percent were right in the middle, and nearly 40 percent offered a 1 or 2.
Part of that dissatisfaction might be because of the sheer amount of data being collected. Twenty percent of the survey respondents are trying to work with 10-100 terabytes, and 17 percent have anywhere from 100 terabytes to more than a petabyte. Most of this data is structured data right now, but companies understand the need to quickly figure out plans for integrating that reliable data with the more unpredictable new inputs. And Hadoop is the big data platform of choice, generally—30 percent of all respondents use Hadoop on-premises today, but for those managing more than 10TB of data, that jumps to 50 percent.
Among the types of data being managed, some are growing far more rapidly than others. Text/content data from emails, call center notes, and claims is growing extremely fast, as is external social media text data.
While most of the respondents are using data science to make traditional reporting and analysis queries, a solid 53 percent are also using it for visual analytics. Predictive analytics is rising quickly as well—collecting text/content data from emails, call centers, and social media is growing rapidly, and will likely create the foundation necessary better understand how customers will react to a new product or a response from customer service.
Citizen data scientist
The data scientist has existed for quite some time now, but that role has recently become much more complex as companies try to convert their big data assets into real value. In the past, data scientists have been predictive modeling professionals—part computer scientist, part statistician, part mathematician, and part business analyst.
That role is changing for a number of reasons, one of which is the advent of what Fern Halper, a VP and senior research director for advanced analytics at TDWI, is calling the “citizen data scientist.” These people are the “next generation of statistical explorers” who are generally self-taught and want self-service access to the tools and data they need to make decisions. Being business users, they tend to not have formal training in statistics, but are taking advantage of easy-to-use analytics platforms.
The big question: finding big data value
A majority of companies are using data science to generate more accurate business insights, followed by better understanding customers, predicting behavior, and improving business practices/processes. Even with the diversity of desired outcomes, there is no single, predictable path to success using big data and data science. According to the report, companies need to work diligently to solve some of the biggest problems before they can start to see that positive return.
Perhaps the most dire, according to TDWI, is the training gap—simply put, data science skills are difficult to come by, and there’s far more demand than supply right now. Companies that hope to get the edge on their competition will likely need to accept that in-house training and self-learning are where they need to focus their attention, along with sending employees outside the organization to receive training from certified instructors. (For the IoT especially, another challenge we’ve reported on is device and data integration).
Helping employees learn more about the practices of data science is important, but equally so is education the entire organization—the C-suite in particular—about what data science is. Without a top-down understanding and interest in the value of the practice, companies will struggle to gather the necessary resources, be those training hours, new infrastructure, or investment in new analytical tools.
To this end, many of the survey respondents reported success in building small proof of concepts. These proofs use real problems the business is facing to showcase the value of data science. If they can show a fast return on interest, all the better.
Best practices
The report ends with 12 best practices for refining data science and big data. First and foremost is getting data in order—not much of a surprise to those who are knee-deep in the practice. Phased approaches to implementing new systems is recommended, as is ensuring that key players have the necessary training before embarking on a new process. TDWI recommends that businesses use multiple analytics methods—predictive analytics and text mining or graph analysis—and to take advantage of both the cloud and new open-source technologies.
One trend noted elsewhere is the use of data platforms and big data-as-a-service to do a lot of the heavy lifting when it comes to analyzing big data. The subject will be tackled at the Data Platforms 2017 conference.
Given the challenges inherent in analyzing big data, and other worries—such as those afraid their jobs will become obsolete by a machine learning algorithm—2017 won’t be an easy year for data science. But, for companies that do it right—through education, collaboration, and agility—they’ll be able to quickly leave proof of concepts behind in favor of genuine ROI.