When a log management platform is engineered with best practices, enterprises can finally afford to log everything.
Today, IT and security teams want it all. What I mean by “all” is that they want to log everything so they can optimize observability and enhance system availability and security. Observability helps teams decrease mean time to recovery (MTTR), increase mean time between failure (MTBF), and improve an organization’s security posture.
Unfortunately, the cost of traditional log management platforms makes comprehensive logging impractical in today’s IT environments. As a result, enterprises are forced to make unhappy tradeoffs when it comes to observability and budgets.
But it doesn’t have to be that way.
If log management providers build their platforms to address these limitations, they can deliver technology designed to reduce the cost and complexity of large-scale log management. Those cost differences are enough to make “logging everything” practical from a business perspective. In fact, not only can purposeful engineering reduce costs, but it can also improve performance.
Here are some best practices for log management platform design that you can use to impact cost and performance.
See also: Why Observability is Essential for Kubernetes
Best Practice #1: Engineering for Cost-Efficient Infrastructure
Infrastructure for ingesting, storing, and processing logs is one of the biggest cost drivers for log management tools. Therefore, engineering and fine-tuning the infrastructure for log management can yield significant cost savings. Let’s consider the details.
Compression
This one is simple math. Better compression ratios mean less data saved to disk, and that means faster searches and lower storage costs. Today, advanced compression algorithms, and the way some modern log management solutions use them, typically yield compression ratios of around 15:1, with some results in excess of 30:1.
Bucket storage
Bucket storage, like AWS S3 or MinIO, makes storing logs cheaper, more scalable, and faster than an alternative approach like file storage. In addition to lower storage costs, bucket storage can enable modern log management platforms to have essentially unlimited retention.
Near-limitless ingest
A traditional log management platform’s cost structure places financial constraints on what gets logged. If you need to log more, you are charged more. These ingest fees penalize teams for ingesting more data, and that doesn’t seem to make sense for log management platforms. Teams that want to “log everything” at scale shouldn’t be discouraged from doing so.
Best Practice #2: Engineering for Performant Features
Of course, log management platforms can’t completely sacrifice performance to achieve cost efficiencies. An affordable platform that doesn’t let you parse logs effectively isn’t a good investment; it’s just cheap.
Fortunately, what makes a modern log management platform more affordable also makes it more performant. Case in point: compression reduces your storage costs and increases search speeds.
So, what features and performance benchmarks matter when it comes to log management and observability?
Reduced ingest lag
How long it takes to get data to the log management platform is important, but it’s important to factor in the additional time between that data arriving and it being searchable by analysts. Insights generated from log data are one of the single best ways to understand a system’s health. The closer to real-time those insights can be presented, the better-positioned teams are to understand the actual state of their systems and networks.
Query speed
In general, free-text search queries should be in the sub-second range, too. This includes petabyte-scale deployments. This has further interesting repercussions when scaling for concurrency; if your queries are 100x faster, your concurrency will often drop by the same factor.
Reliability and fault tolerance
Enterprise-grade log management platforms need enterprise-grade uptime. Because SLAs matter, log management systems should be fault tolerant and highly available for production use cases.
Security
Log data often contains sensitive information. Therefore, log management platforms must prioritize security. Platforms should offer features such as encryption of data in transit and at rest, modern encryption algorithms (AES-256 and TLS 1.2 or later), and support for enterprise-grade authentication protocols and MFA and granular role-based authentication and access control (RBAC).
Indexing (or not)
Similarly, how a platform handles indexing can significantly influence free-text search. Index-free design has thus far proven to be the fastest and most scalable approach to log management.
To enable effective index-free searches, platforms need a mechanism to efficiently ingest and filter logs. Techniques like tagging, constraining searches to a specified timeframe and using Bloom filters all help index-free design outperform searches that depend on extensive indexing.
Best Practice #3: Providing Log Management at Scale
Often, the cost of log management accelerates when your company starts to scale. Most providers can give you a reasonably affordable — or even free — solution when log ingestion is limited to megabytes or gigabytes per day. However, as teams move into the range of hundreds of terabytes per day, log management can get expensive fast. What about when your scale hits one petabyte (PB) per day?
Cost aside, log management platforms that aren’t purposefully engineered also become impractical to use if you truly want to “log everything.” Purposeful engineering keeps scale in mind while remaining cost-efficient and performant.
In addition to techniques like leveraging compression, data streaming, and index-free search, a modern log management technology adds effective cluster management, fault-tolerant distributed systems, and a better approach for the underlying compute and storage resources.
If you’re building your own infrastructure, providers specializing in scalable distributed systems can help you get the complexities right.
Purposeful Engineering Makes Logging Everything Possible
Many providers can give enterprises a log management platform that is affordable—but only to a point. Once an enterprise needs to scale, costs begin to skyrocket. On top of this, performance will begin to suffer.
However, when a log management platform is engineered with best practices — designed with cost-efficient infrastructure and performant features at scale — enterprises can finally afford to log everything. When the engineering behind your platform is done right, observability shouldn’t be expensive.