Getting a handle on Big Data using stream processing often involves getting data from disparate sources, then picking and choosing what you want to see. One lesson learned: know what you need.
Companies are expected to analyze an increasing amount of data from real-time event streams, Internet of Things devices, and machine data, according to a forthcoming TDWI “Best Practices” report for Big Data.
The TDWI report, to be published in the fall, covers emerging technologies and methods, which include new sources of Big Data, vendor tools, team structures, and development methods. TDWI asked 332 respondents about their current sources of data from emerging technologies and methods, and what they expect to be using three years from now. Over that stretch, the percentage of respondents using real-time event streams grows from 23 percent to 36 percent. For IoT devices, the number moves from 18 percent to 39 percent; and for machine data, the jump is 27 percent to 33 percent.
Respondents also expect to increasingly use stream processing and stream mining (see chart), which overlap to some extent with complex event processing. The big idea with real-time streams is to make sense of data and react quickly to it—for example with network attacks, system monitoring, or fraud.
“More recently advanced analytics is being done actually in the stream,” said Fern Halper, TDWI’s research director of advanced analytics, during an Aug. 12 webinar. That’s opposed to storing the data outside of the stream and then doing the analytics. “For example, a Fourier transform might be done in the stream. I believe that vendors are working on developing advanced algorithms that could work in a window or developing ways for more advanced analytics to work over the stream.”
Use Cases for Stream Processing
One use case for real-time stream processing is health care, where patients in intensive care may be monitored for blood pressure, heart rate, and other vital indicators. The streams could be analyzed together, with an alert sent out if a sensor or combination thereof falls below a certain level. Such a system could also be blended with historical data about patients at risk to form a data model.
Another use case: the energy industry. An oil rig, for example, measures pressure, temperature, and other factors. With historical data from the rig, a model might be built predicting when the rig might fail, Halper said. As data flows through the system, the model would be scored, and if there’s a high probability of failure, the machinery would be removed from service.
Breaking Down Big Data Barriers
But what happens when data comes from disparate sources and in different formats, and a company needs a tool to make sense of it all? That’s exactly the issue that faced Tangoe, a global company that provides software to manage expenses for connections to mobile devices, the Internet, and networks.
Tangoe, which calls its services “lifecycle connection management,” processes 1.2 million invoices per month. Its customers include financial giants such as Bank of America and Visa; tech companies such as Oracle; and retailers such as Target. It deals with 3,100 different telecom carriers, 1,800 different bill formats, and 165 call-center agents across the globe. It went looking for a system to analyze Big Data from sources such as TCP/IP data streams, call detail records, invoice data from telecom carriers, and mobile devices.
It also wanted the system to handle areas such as split billing for personal versus business use, cost allocation to different departments, productivity analysis, as well as security and threat detection.
“It’s a lot of data that flows through the system … and we’re trying to get some analysis to help our customers understand how to allocate it and how to manage the cost for it,” said Jaan Leemet, senior vice president of advanced technology for Tangoe.
One challenge in selecting a system is that there were limits on the data that could be captured from mobile devices, such as iPhones and Androids. Mobile devices expose only certain functions through application programming interfaces (APIs). “In fact, some smaller IoT devices are designed to be low power, low cost, small footprint and may provide almost no facilities for integration,” Leemet said. “Rather than trying to access the data through these APIs we’ve been looking at the data itself as it flows by in the stream. “
Another challenge was the large amounts of data produced by millions of devices, and being able to act quickly for certain transactions that prompted security threats. “When we start looking at patterns in the stream we have an opportunity to detect anomalous activity such as data leakage and even odd application activity,” Leemet said.
Stream Processing Benefits
Tangoe ended up choosing SAP’s HANA platform as the primary stream processing engine. Tangoe liked the flexibility of the system, which allowed the company to consume data sources from sources such as call-detail records, data streams, and invoice data. And while other companies may need archival of the complete data (for example a Hadoop lake), in Tangoe’s case it only consumed what it needed and jettisoned the rest.
In general, “what’s really nice with the stream is, as it flies by, you can pick and choose what you want,” Lemeet said. A lot of the overhead in storing and treating data is “minimized.”
There is such a thing, after all, as too much data. If you’re monitoring patients in an ICU, a five-minute data window is “probably too long,” Halper said. In financial markets, analysis often has to be done in 15- to 30-second windows. But capturing temperature every second in a smart home? “That would be overkill,” Halper said.
Want more? Check out our most-read content:
Research from Gartner: Real-Time Analytics with the Internet of Things
Smart Cities: Managing Traffic in China
Becoming an ‘Always On’ Smart Business
IoT Poised for Fast Growth: A Look at Applications
Webinar Replay: Better Business Insight with Real-Time Streaming Analysis
Liked this article? Share it with your colleagues using the links below!