Managing data gravity through proper planning prevents poor performance.
Top-notch real estate agents know how to ask the right questions before a home search begins. What are you looking for? How many people need their own room? Do you have a budding percussionist in the family that needs a spot to bang the drums without driving everybody else nuts? How much guest space do you need? These questions help ensure the home will meet your needs and allow for growth when, for example, Uncle Lou unexpectantly shows up and asks to stay for a few weeks.
What does this have to do with data gravity? In the words of Alexander Graham Bell, “Before anything else, preparation is the key to success.” And so it goes with data gravity.
Proper planning’s potential
In two previous and related blogs, we described the very real challenge of reducing the data gravity pull. While keeping data for future use isn’t inherently bad, if not managed properly, data builds over time to create latency and rising costs, especially when trying to extract value from that data while at the same time processing new data. In this blog, we will look at ways to properly plan to avoid data gravity – before it grows to unmanageable levels. In turn, this helps prevent poor performance and unnecessary expenses.
Data gravity planning must also ask the right questions. What different kinds of data are we going to keep at this new location? What are we expecting from it? Who needs answers, and how fast are those answers needed? How much data can and should be housed at this location? What is the cost/penalty for slower answers or the cost of managing, protecting, copying, and storing that growing amount of data?
Preventing poor performance
Let’s look at an example. A healthcare organization wants to open up a new remote clinic. That new clinic will need the data from images and tests they run to be analyzed on location and get patient results fast. In this highly oversimplified scenario alone, two categorically different types of data need to be collected and managed:
+ Transactional information, such as name, insurance details, health history and co-pay
+ Testing information, such as X-rays, MRIs, and Lab tests
From a data management perspective, the remote clinic will not save all of the data on location. And the different types of data have different levels of value. While all the data is governed by HIPAA regulations, the transactional information can often be readily reproduced, but the testing information is not as easily. While organizational data management practices differ, it is likely the remote clinic’s data is going to be protected in some fashion and then securely copied to another location.
In this scenario, decisions need to be made on how data will flow before the remote clinic is up and operational. For example:
- Which types of data (e.g., transactional, testing, raw data, analyzed data) should be saved and in which ways?
- How long (e.g., days, months, years) should the data be stored?
- Where (e.g., onsite, offsite) should the data be stored?
- How (e.g., (tape library, disk, cloud) should the data be stored?
Powering forward
Making these decisions in the planning phase will help mitigate data gravity and avoid costly performance drops and financial penalties later. Concepts such as capacity planning and long-term lifecycle management are vital to proper, proactive planning in the effective collection, use, and storage of data. These strategies must be articulated before data collection. Just like in our house metaphor…
When we start to hit the limit of practical effectiveness inside of our house, we must make some tough decisions. Can we expand what we have by building out? Do we need to rent a storage locker and move things out to another location? Maybe we need to call in a hoarding specialist to help figure out a plan for reducing the clutter. Similarly, in data planning, regardless of the options chosen, resolving issues after implementation are costly, may take significant time to fix, and can negatively affect the bottom line and customer experience.
Data gravity is a serious issue that is growing in magnitude and reach. As the five P’s indicate, “proper planning prevents poor performance.” Across organizations and industries, IT capacity planning practices are crucial to define goals and create avenues that help protect data processing speed and minimize cost – before it’s too late to do so.
To learn more, visit Dell.com/Edge