Businesses are increasingly moving to a hybrid cloud big data architecture. Here’s why:
An increasing number of businesses are moving their data operations into the cloud, attracted by low costs and easy scalability, but that doesn’t mean on-premise data centers are disappearing.
“When people think about the cloud, they’re usually thinking about public or private clouds in terms of deployment models,” said Fern Halper, TDWI director of research for advanced analytics, during a recent webinar. A cloud deployment usually operates on a pay-for-service model, though in the private cloud, the deployment is dedicated to a single organization.
Reasons for Moving to the Cloud
According to a forthcoming TDWI best practices report on cloud analytics, 42 percent of respondents are using the cloud now for analytics and 36 percent are planning to do so. Approximately 11 percent said they would never use the cloud for analytics.
“When we first started asking this question, four years ago, we got about 25 percent saying they would never use the cloud for analytics,” Halper noted.
“The reality is that many organizations are going to have data on their premises as well as in the cloud and they’re going to have to deal with that through some sort of hybrid model,” Halper said. “This is where things are moving and it’s a model whose time has come.”
Businesses primarily said they were considering the cloud for scalability, cost advantages, and flexibility. Organizations that are considering moving to a hybrid cloud, however, should plan wisely for data integration among different systems, infrastructure monitoring, and security, Halper said.
Cloud Analytics Architectures
One example a hybrid cloud architecture is using a customer the Internet of Things, in which data from a fleet of trucks might flow from sensors into a gateway, which sends the data to the public cloud where it is processed and filtered. Some of the data might then go to a data center, where it could be analyzed to build a model for predictive maintenance. That model would then be deployed back to the gateway device.
Another example of a hybrid cloud architecture could involve integrating data from a cloud-based customer relationship management system with a corporation’s on-premise data or data from other cloud services, whether public or private,
Yet another example is Netflix. The company uses Amazon Web Services to keep track of users, their preferences, and what they click on, according to an article in Network World.
As Netflix explained, Amazon Web Services S3, a cloud-based system, has high durability and availability, and can sustain concurrent loss of data in two facilities. Also, S3 can protect against inadvertent data loss, is elastic, and provides practically “unlimited” size. “We grew our data warehouse organically from a few hundred terabytes to petabytes without having to provision any storage resources in advance,” Netflix developers explained.
Netflix began moving to the cloud in 2008, when a corruption in its database meant it could not ship DVDs to customers, costing the company millions in revenue per day.
“We were just beginning to deploy our streaming service and realized a similar outage would be totally devastating to the customers and to the business. That’s when we decided to move away from vertical data centers,” said Yury Izrailevsky, director, cloud platform engineering at Netflix, in a 2013 presentation.
Izrailevsky also stressed that the cloud also supports global expansion and reduces costs. Netflix currently has about 47 million subscribers, and operates across several continents, but adding data centers to support new customers in different regions could take months and tens of millions of dollars. Use of the cloud also has reduced the Netflix cost for streaming start by 87 percent.
Netflix, however, also uses a Teradata cloud database to support complex query-based analytics. Netflix slams that database with 70 or more concurrent queries and sophisticated integration and processing at extremely high speeds, according to Teradata.
Netflix uses both data about users, their preferences and movie ratings, as well as data about pauses, rewinds, fast forwards and popular movies to power its recommendation engine.