A cloud migration can benefit from an observability solution that includes multi-cloud monitoring and application observability, as well as high scalability.
As companies migrate applications to the cloud, they often find that legacy management and monitoring tools are of limited use. They frequently do not scale well and are siloed solutions that make it hard to share their insights across different environments and with other tools. In fact, what’s needed is a solution that has several characteristics, none of which are common in tools that were designed for on-premises applications and services. Increasingly, what’s needed is a solution that offers data-driven observability.
When a cloud migration is carried out, it is not just a matter of monitoring or measuring the performance of a single cloud service. Typically, organizations have multi-cloud and hybrid cloud environments to manage.
Additionally, because of the multiple aspects of any cloud application or service, it is often hard to pinpoint the root cause of any cloud application performance degradation. It is still harder to know what needs adjustment to optimize cloud performance.
When outages occur, organizations face similar problems. The complexity of the cloud environment normally masks the source of the problem. Two recent cloud outages illustrate just how varied the source of a problem can be.
In October, Facebook and many of its offerings, including Instagram, Messenger, Whatsapp, and OculusVR suffered a six-hour outage. The outage was Facebook’s largest since 2019 when the site was down for more than 24 hours.
The source of the problem was a routing protocol configuration issue. After finding the source of the problem and restoring services, the company discussed the root cause of the problem in a blog, noting:
“Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.”
More recently, Amazon suffered an outage that lasted several hours. The outage impacted everything from sites selling Adele concert tickets to Netflix videos to Disney amusement parks. The company released a statement about the source of the problem, noting:
“…an automated activity to scale capacity of one of the AWS services hosted in the main AWS network triggered an unexpected behavior from a large number of clients inside the internal network. This resulted in a large surge of connection activity that overwhelmed the networking devices between the internal network and the main AWS network, resulting in delays for communication between these networks. These delays increased latency and errors for services communicating between these networks, resulting in even more connection attempts and retries. This led to persistent congestion and performance issues on the devices connecting the two networks.”
See Also: Continuous Intelligence Insights
What’s an enterprise to do?
If Facebook and Amazon can be hit with significant outages, what chance does an enterprise have at reducing the chances of a cloud migration problem?
While there is no way to guarantee optimized cloud applications and services will not suffer performance degradation or outages, there are some key monitoring and observability features that can help.
Specifically, what’s needed is an observability solution that delivers data insights to keep a cloud application or service running. Any such solution must include multi-cloud monitoring and application observability, as well as high scalability.