Evaluate your SIEM
Get the guideComplete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.
May 4, 2020
After StatsD and Graphite weren’t able to meet their needs for metrics and monitoring, engineers at SoundCloud developed the open source event monitoring and alerting tool, Prometheus. Because it’s easy to deploy and get started with -- and on the surface seems free -- it’s become a popular part of many DevOps teams' observability stack.
As an environment scales, so does the complexity of the Prometheus deployment. Many teams inevitably put more pressure on Prometheus than it was designed to handle. In fact, the Prometheus documentation states it stores data only for a short period of time and was not designed to do otherwise. These expanded use cases and expectations stretch Prometheus and require careful consideration for scaling. Ultimately, Prometheus wasn’t designed to answer questions like these:
While this scalability problem doesn’t arise when Prometheus is monitoring small or simple deployments, the lack of visibility and unified data adds an extra cost when attempting to use Prometheus as a monitoring source of truth for distributed applications.
Many DevOps teams realize the unavoidable difficulties and instead opt to augment their monitoring with a purpose-built solution. Sumo Logic is used to greatly simplify the challenges related to managing Prometheus at scale including data aggregation, long term data retention, and log and event correlation in a unified service.
By default, Prometheus servers provide persistent storage, but it was not created for distributed metrics storage across multiple nodes.
Sumo Logic greatly simplifies the process of scaling out a Prometheus deployment. By seamlessly aggregating Prometheus metrics data, Sumo Logic eliminates data silos and allows for global views of the entire cluster.
Aggregate data enables:
By default, Prometheus only stores data for a short time and isn't designed to do otherwise. According to Prometheus’ docs (emphasis mine), “Note that a limitation of the local storage is that it is not clustered or replicated. Thus, it is not arbitrarily scalable or durable in the face of disk or node outages and should be treated as you would any other kind of single node database. Using RAID for disk availability, snapshots for backups, capacity planning, etc, is recommended for improved durability.”
Sumo Logic takes care of long term storage of Prometheus metrics enabling:
To effectively tie metrics, events, and logs together, the monitoring agent needs to collect and store the events. Prometheus on its own does not collect or store events. It only does metrics.
Visibility of one without the other provides you with incomplete data; you need both to troubleshoot application issues quickly and efficiently.
Running Prometheus in a highly available and scalable way requires a significant investment and engineering talent. Once your environment gets to a certain size you’ll need to allot employees and systems dedicated to running Prometheus rather than innovating on your product. Only a small handful of companies can afford to put resources towards managing support systems instead of projects that contribute to their core business.
These solutions tend to run into significant challenges when used for medium and large environments. It is during business-critical moments, like troubleshooting significant issues, that metrics are the most important -- organizations can’t afford to not have them available.
Sumo Logic’s scalability has been proven by thousands of customers who rely on Sumo Logic for operational insight into their logs and metrics. The multi-tenant architecture can ingest and analyze petabytes of metrics logs and event data; the solution also scales on demand to support rapid and elastic growth.
Prometheus is a great solution for collecting performance metrics data, however, for analytics on production deployments you need the reliability and scalability that Prometheus simply wasn’t built to handle. Augmenting Prometheus with Sumo Logic will provide greater value in the long term while giving your teams better performance and observability throughout your stack.
Scale your Prometheus monitoring with Sumo Logic. Sign up for a free trial.
Reduce downtime and move from reactive to proactive monitoring.
Build, run, and secure modern applications and cloud infrastructures.
Start free trial