A lot of tomorrow’s analytics will be done locally at the “edge,” or in a public or private cloud. Here’s what will drive where your analysis will happen.
Some say everything — all data and applications — will go to the cloud. Others, such as market research firm IDC, say that around 40 percent of data will be stored, managed, analyzed and kept right where it was produced, at the edge. So where’s the truth? Everywhere, actually. Analytics will be done locally at the edge of the data center, or in a public, private, or hybrid cloud.
See also: Real-time analytics critical to IoT projects, says IDC
The same dynamic played out in computing. The distributed computing model ushered in by PCs didn’t eliminate the need for centralized servers. Likewise, you likely didn’t replace your laptop with a terminal with the birth of the cloud.
Where you do your analytics will ultimately depend on the following five factors:
#1: Speed — Do you need the information now, as in right now, or can it wait? The quicker you need answers, the less likely you are going to the cloud. Remember the “The Great American Eclipse” earlier this summer? In California, it removed about 6 gigawatts of capacity from the grid by blocking the sun, according to CAISO, the operator responsible for 80 percent of the state’s power, or enough for more than four million homes, and then rapidly returned it within a few minutes. CAISO received power data from the grid’s generators every four seconds to prevent the fluctuations that can cause problems.
#2: Reliability and Safety — If Twitter goes down for an hour, it prompts jokes. If water service across the Eastern Seaboard were suddenly disrupted with no explanation, angry calls and panic ensue.
Oil companies and mining operations, for instance, are adopting cloud technologies for deep analytics, i.e. analyzing existing processes for potential costs savings or optimization. It makes sense because you can spin up thousands of servers at once to tackle massive computing problems, but the answer isn’t needed urgently.
When it comes to “live” operations, however, those remain local. Imagine trying to control a driverless 60-ton truck via the cloud? Or managing undersea drilling operations. Unanticipated repairs can quickly amount to millions of dollars in terms of repair costs and lost revenue. Companies also have to helicopter in IT support as well as beds for the additional personnel. The risks and costs are just too astronomical to not be on the edge.
#3: Bandwidth and Bandwidth Cost — If it’s a torrential amount of data being generated and you don’t need all of it to make a sound analysis, then just send summary data. A “smart factory” might track 50,000 sensors and generate several petabytes a day. Even a standard office building will generate 250 GB or more. Rather than try to analyze data in the cloud or control thermostats remotely, a lot of these jobs will be cheaper and more easily accomplished on a local level. (A lot of smart lighting companies have already gone from cloud-based to distributed control.)
Similarly, wind farms can consist of hundreds of wind-powered turbines, they generate vast amounts of data — revolutions per minute, wind direction and speed, position of the blades relative to the wind — it’s impractical to bear the cost of transmitting all of that data to the cloud to monitor the overall health of the farm.
Wikibon cites a case study of a remote wind farm with security cameras and other sensors. With 200 miles between the cloud and the wind farm, and with an assumed 95 percent traffic reduction by using edge computing capabilities, the total cost of management and processing was reduced to $29,000 from $81,000 over three years. The analyst firm believes that the Internet of Things will develop via an edge-plus-cloud computing design for the “vast majority” of sensor implementations. In this use case, and in other similar ones, an approach of sense locally and transmit summary data to the cloud is more cost-effective.
#4: Location of Your Challenge — Ask yourself who needs the data; is it the engineers at the plant or a whole slew of different parties, organizational departments, or geographically dispersed stakeholders? Is the problem at hand something awry with one particular production line or the design of the overall production line. If it’s a local problem, store and analyzed locally.
The automotive research group at RWTH Aachen in Germany is developing techniques to “correct” problems in EV batteries during production. Researcher Christoph Lienemann believes in-situ monitoring could cut the cost by 20% and rescue scrapped cells from 10 to 2%.
Local successes, of course, can then be replicated and shared across the enterprise.
#5: Complexity of Your Challenge — This is the ultimate factor. Are you examining a few data streams to solve an immediate problem such as optimizing a conveyor belt in a factory or are you comparing thousands of lines across multiple facilities? Are you looking at a patient’s vital signs to determine a course of treatment, or are you developing a new therapeutic that requires studying millions of different proteins? Humans are often underrated as a business intelligence platform: our conclusions far exceed what most machines can do. We just can scale well after a certain point.
Depending on the nature of the analytics in question, many of these factors I’ve discussed will commingle and overlap; some may take a higher priority than others. In the end, as with many things data-related, when it comes to where all the world’s data and applications will leave, the answer isn’t so clear-cut. Rather, this will be determined by the factors of time, type of problem, bandwidth, reliability, safety, and speed. The truth is out there — it’s everywhere and just not where you might expect to find it.