Loop AI Labs explains how its cognitive computing platform has an advantage over traditional natural language processing techniques with dark data.
Approximately 80 percent of digital information kept by businesses is “dark,” meaning it is not being analyzed. Such information consists of various financial and corporate documents, emails, and customer support forum messages.
Finding insights from such data would be time-consuming and expensive if left to an army of human workers. Loop AI Labs, however, has developed a “Cognitive Computing Platform” to accomplish the task.
The platform consists of hardware and software that can automatically interpret natural language and obtain a structured representation of relationships in such large stores of text. The Loop Learning Appliance and the Loop Reasoning Appliance use machine-learning algorithms to learn language and process concepts from source data.
According to Loop AI Labs, the system excels at turning unstructured data (such as from social media, electronic medical records, web pages, and similar) into structured representations, in a way similar to how people learn. Sentiment analysis is also used. Under a partnership with Nvidia, businesses can also rent GPUs to analyze their data on premises rather than send it to the cloud.
In the video below, Patrick Ehlen, chief scientist, Loop AI Labs, explains how the system works:
Transcript:
Interview by Adrian Bowles, RTInsights executive analyst and founder of Storm Insights.
Adrian: We’re here today with Patrick, the Chief Scientist at Loop AI. It’s great to meet you. I’ve been following Loop for a little while here. I’m trying to get up to speed. One of the things that interests me in starting to research the whole machine learning area, there’s so many companies out there right now that are struggling to differentiate themselves but you seem to have taken a pretty cool approach. Can you take us through it a little bit in terms of what the company is doing and what you’re offering as well?
Patrick: Sure. Our aim is to do a what we call human capacity cognitive computing. By human capacity, I’m sort of singling out the things that make the human communication system distinct from what you might see in the animal kingdom and that kind of thing. Humans have a facility with three different things that you don’t typically see in other communication systems. That’s multi-modal communication and we can use our hands and gestures and I could take a broom handle and have it mean something in a conversation that could substitute for a verbal symbol or something like that. We’re very good at that. We also have this facility with symbol recursion where we’re able to take a symbol and fold it into a function and take the output of that function and fold it into another function which gives us a very rich and complex syntactic capability and also semantic capability. You don’t see this anywhere else in the animal kingdom.
Because of that, we have what’s called the script infinity which is just the infinite use of finite means, right? We only have the letters A through Z and a certain number of words in our vocabularies and yet we have an infinite number of possible generations that we can make out of that. The thing with machines is machines generally sort a finite use of finite means. What we’re trying to do is break machines out of the confines of that they have currently with language understanding systems and introduce this multi-modal and recursive systems into a cognitive computing platform that’ll be able to understand language the way that we do.
We approach that using deep learning and then a number of other techniques that we developed over time. Bart Peintner our CTO and I worked together on some very big artificial intelligence projects sponsored by DARPA and ONR and a number of these big organization projects that were a lot of effort, and a lot of teams worked on it, and this [was] equivalent to like the Human Genome Project of Artificial Intelligence. When we started doing what we’re doing here, we were looking at all of the other things that we could do and what other people were doing and we really wanted to take as opposed to the Human Genome Project sort of this Craig Venter approach of let’s see how we can streamline things down and get the same amount of bang as you would get for this large projects but with a lot less buck. We started doing research in 2012 and really thinking about these things. Over the past couple of years, developed this cognitive computing platform that we’re using now.
We focused on shining light on what we call dark data. There are lots of companies out there that have this dark data. You have documents, financial documents or corporate documents or whatever. You’ve got emails form people in and out of the company. You’ve got customer support forums and emails and tickets and that sort of thing. There’s lots of language in text data out there that companies have and they know that there’s good stuff in there but either you’ve got to pay somebody to read it or you have to use some kind of natural language processing approach that is going to try to understand the nature of what’s going on in your data or it just has to remain dark. What we’re trying to do is to take that data and to provide a structured representation of that data that you can enhance into your data science team and have them use that for the things that they want to do.
The thing that distinguishes us is the fact that we’re really able to understand that data within its own context. A lot of natural language processing techniques start with some model of the world and of particular language they make a bunch of assumptions about what things are supposed to mean in the context of that data and then they try to get at what you’re getting at starting from all those assumptions. Our system starts from scratch and it doesn’t start with any assumptions about what your data looks like. It doesn’t even make assumptions about what language you’re using because there are companies out there that have documents that switch quickly from one language to another. It might go from English to Tagalog and back again in the middle of a sentence and you don’t want to make assumptions about what language that’s in or what language you expected to go into. You want to have a system that can actually just learn that kind of thing as it’s going along.
The real distinguishing factor with us is we have something that is language independent and it doesn’t depend on certain assumptions that you would make about the data. The other side of this is that because we don’t make the assumptions, we do need a lot more data than you might need if you were using natural language processing. The companies that we tend to be working with right now are Fortune 1000 companies that already have a lot of this data and we can bring them a system and say, “Put this in your data center.” We have a GPU appliance that people can rent from us and put in there in their data center and then they plug in their own data, hopefully lots of data and from that begin to learn about what’s in that data.
When we first started, we thought that we would do a cloud-based approach. Perhaps, have our own form of GPU servers that would do all the distributed processing and people could just feed their data into it. When we got out there into the market, we discovered that there are lots of these larger companies that have a lot of this data and they really want to know what’s in it but they do not want to send it into the cloud. They would rather keep that data in their own data centers. Because of that, we started talking with Nvidia and we got a partnership going with them where we have the GPUs or the appliance solution that we can just provide to people at will. They license it and can keep it onsite for as long as they need to use it.
Adrian: Your technical approach in terms of the way you’re handling language processing that has traditional NLP, what’s the unit … because if you’re mixing languages, that’s cool.
Patrick: This kind of harkens back to our, we’re talking about in the beginning about the human capacity understanding and that you have to have this recursive system which means symbols are going to be folded in with other symbols. You might have certain atomic elements but at the most part, what’s going into your representational space might be of at different levels of the hierarchy.
The main idea here is for us to be able to learn things as data is coming in, as the system is going along. I like to think of it like let’s say that you don’t know anything about gardening and you decide that you want to learn about it so you pick up a book that you ordered on an Amazon and you start reading. As you’re going through, certain concepts are presenting themselves to you. In the beginning, you might not be able to distinguish those concepts to you very well. An evergreen is an evergreen but you don’t know the different between the fir and the conifer and that kind of thing. As you acquire more information, these distinctions become more clear to you. You’re able to separate out concepts and the space gets larger and things become separated in that space. That’s how more or less of a high level view of how it’s done.
One of the distinguishing features of this approach is that as more data comes in, it’s learning and changing its ideas about what the world is and what things mean in that world. For example, there was a time when TVs didn’t have WiFI and if you’re just analyzing consumer electronics data and suddenly, TVs and WiFI start to be talked about within the same context, you slowly learn that WiFi can now be a feature of television. That’s the approach that we have of learning with the world as it changes, changing that model as things change. Another important aspect of this is it’s doing it in an unsupervised way, so you don’t have people coming in and making labels going through all the labor of saying like, all right, we want to label this as a television given these features and that sort of thing.
Those are two big selling points for us [and[ the fact that it’s unsupervised. We streamlined this whole process. We don’t have to have six months, 12 months of lead time and annotation and consulting with experts and that sort of thing.
Want more? Check out our most-read content:
Research from Gartner: Real-Time Analytics with the Internet of Things
The Value of Bringing Analytics to the Edge
Frontiers in Artificial Intelligence for the IoT: White Paper
What’s Behind the Attraction to Apache Spark
Three Types of IoT Analytics: Approaches and Use Cases
When Sensors Don’t Make Sense: Ridiculous IoT Devices
Liked this article? Share it with your colleagues!