Researchers have developed a rigorous neural net algorithm designed to extract essential features of big data sets in one pass.
In the latest advances in data processing, researchers have created a new algorithm that can apply directly to the largest datasets — extracting salient features in one pass instead of many. This could change how we approach big data analytics.
The research builds on the way humans perceive patterns in datasets and amplifies those techniques. Specifically, we synthesize information, and we’ve created machines that can fulfill our wildest pattern-finding dreams. The new algorithm process extraordinary amounts of data, far greater than our pattern processing could ever handle.
See also: MIT Scientists Attempt To Make Neural Networks More Efficient
A big step in processing
Processing large datasets requires extracting the most salient statistical properties to find insights within the massive amounts of data. These algorithms can’t be applied directly to the dataset itself because of its size, but a subfield of machine learning exists seeking algorithms that can.
Reza Oftadeh, a doctoral student in the Department of Computer Science and Engineering at Texas A&M University, has developed a rigorous algorithm designed to extract essential features of data sets in one pass. Along with his advisor, Dr. Dylan Shell, Oftadeh believes this could significantly reduce the processing load on massive data and allow more types of organizations to glean valuable insights.
It is a useful machine-learning tool because it can extract and directly order features from most salient to least. “There are many ad hoc ways to extract these features using machine-learning algorithms, but we now have a fully rigorous theoretical proof that our model can find and extract these prominent features from the data simultaneously, doing so in one pass of the algorithm,” said Oftadeh in a university announcement.
The impact of reducing big data complexity
As humans make more data, the processing power required to create value keeps some organizations from extracting the features they need. In some fields – medicine is a great example – extremely large data sets are necessary, but team members with the expertise to utilize the data to its fullest potential aren’t always available.
Analyzing massive datasets is a very complicated, time-consuming process for human programmers, so artificial neural networks (ANNs) are often used. As one of the main tools of machine learning, ANNs are designed to simulate how the human brain analyzes and processes information. They are most commonly used to identify the unique features that best represent the data and classify them into different categories based on that information.
Right now, models must repeatedly run to extract the most important data features from massive data sets. If you have hundreds of thousands of dimensions, algorithms may not be able to run enough times to extract what’s needed. Even worse, research teams may not have the skills necessary to capture the insight necessary for advancement.
To make a more intelligent algorithm, the researchers propose adding a new cost function to the network that provides the exact location of the features directly ordered by their relative importance. Once incorporated, their method results in more efficient processing that can be fed bigger datasets to perform classic data analysis.
By reducing large data complexity, Oftadeh’s algorithm could open the door to bigger, better training projects and a much larger return. Right now, the algorithm can be applied to one-dimensional data samples, but the team’s end goal is to expand its capability to produce a unified framework for finding other algorithms to extract salient features with an even smaller number of specifications.
The work is a collaborative effort. Others instrumental in the algorithm’s creation include Jiayi Shen, a doctoral student in the computer science and engineering doctoral student; Dr. Zhangyang “Atlas” Wang, assistant professor in the electrical and computer engineering department at UT Austin; and Dr. Boris Hanin, an assistant mathematics professor at Princeton University.