Big Data vs. Fast Data: Breaking the Mold of Database Thinking

The big data versus fast data debate is not cut and dry. Successful organizations need a combination of both approaches.

Our world currently runs on big data. With the amount of big data created reaching 64.2 zettabytes in 2020 (almost 30 zettabytes more than previous projections) and the expectation for it to grow to more than 180 zettabytes by 2025, it shows no signs of stopping.

But, with 68% of all data collected going un-leveraged, there lies a huge opportunity in turning raw data into something that creates immediate business value. The current methodology of storing everything and hoping for the best is not a scalable model, as many organizations currently drowning in data but with no real way of taking action are now realizing. To be successful in the modern business landscape means breaking the mold of database thinking.

In this article, we explore how switching from a big data approach to a fast data approach will enable enterprises to transition into the next era of computing and how the edge is a key component in becoming data efficient.

See also: A Return to Small Data for 2021

Big data vs. fast data

Big data

Big data analytics has been around for a long time, with the first actual coining of the phrase happening in 2005 by Roger Mougalas around the same time Hadoop was created. Obviously, we have come a long way since then in terms of technological prowess and the amount of data being created, but the process is inherently the same with data being ingested and stored for future analysis.

This approach has and will continue to work very well in situations where the data has something interesting to say and can be analyzed by AI/ML algorithms to find patterns and trends (such as historical financial market information). But when a sensor is spewing out the same information 99% of the time, there is no need to store all of that data. However, analysis must still be done to ensure nothing is missed, and this is where fast data can take the reigns.

Fast data

Fast data contrasts big data in that instead of collecting as much data as possible, it focuses on turning raw streaming data into immediately actionable events. This method of data analysis is vastly superior to storing everything in a database when working with IoT sensors and devices that are constantly creating new data. Only the important anomalous data points need to be acted on while everything else can be discarded.

By utilizing edge computing to perform pre-filtering/processing on streaming data at the source, organizations are able to scale much faster by avoiding moving everything to the cloud. Only data that is useful for future analysis is kept. A fast data approach using edge computing also provides the benefits of distributed processing (by moving compute power closer to or on the edge devices themselves), allowing for massive reductions in latency and bandwidth.

The best of both worlds

Of course, as with most things in life, the big data versus fast data debate is not completely cut and dry. In order for organizations to be successful in the modern age, a combination of both approaches is necessary. Big data is extremely helpful in finding hidden trends in the data after the fact, while fast data is better suited to responding to events as they happen. In fact, the patterns and trained AI models that big data analytics is able to unearth can be executed by fast data models so that they are put to use in an operationally relevant way.

By blending these two approaches, organizations are better equipped to evolve their applications and processes to constantly changing market conditions both as situations unfold and after the fact.

Take advantage of distributed computing

By migrating business operations from running solely in a data center or the cloud to analyzing, filtering, and acting on data at the edge, much more value/insights can be extracted from the raw streaming data. Unfortunately, this is a lot easier said than done with many organizations struggling to run their operations at the edge.

But there is an easier way as tech industry veteran and Vantiq CTO Paul Butterworth lays out in the whitepaper Distribution and Federation in Real-Time, Event-Driven Business Applications. Join Paul as he discusses the growing need for edge computing, what a distributed model is, and how to manage running your operations on the edge.