Evaluate your SIEM
Get the guideComplete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.
January 26, 2023
Observability has become one of the most important areas of your application and infrastructure landscape, and the market has an abundance of tools available that seem to do what you need. In reality, however, most products - especially leading open source tools - were created to solve a single problem extremely well, and have added additional supporting functionality to become a more robust solution; but the non-core functionality is rarely best of breed. Examples of these are Prometheus and Grafana.
The one-sentence description right from the source: "The Grafana project was started by Torkel Ödegaard in 2014 and ... allows you to query, visualize and alert on metrics and logs no matter where they are stored."
Essentially, Grafana is a tool whose purpose is to compile and visualize data through dashboards from the data sources available throughout an organization. From these dashboards, it handles a basic alerting functionality that generates visual alarms. Grafana works best with time-series data, which is what most monitoring and observability platforms produce and store in databases like Graphite, Elastic, or Prometheus's native repository. Through the use of plug-ins, Grafana can also pull data directly from a wide variety of data sources from public cloud providers' monitoring solutions, including Google's Stackdriver and AWS CloudWatch, to SQL databases, like MariaDB and PostgreSQL.
The most commonly mentioned competitor to Grafana would be Kibana from the Elasticsearch ecosystem. Kibana and Grafana have the same goal of making it easy to visualize and alert on the data that is available to them - which is also Kibana's biggest weakness. Kibana only supports Elastic as a datasource, while Grafana is not limited to one source.
One thing Kibana is better at than Grafana is its search capabilities, which makes sense, as it is the tool that Elastic uses in its commercial offering. Extensive search and event correlation are features that only commercial offerings have the time and resources to do well.
Grafana has a limitation where it doesn't have a native capability to aggregate data from multiple sources as it isn't a data store of its own which leads to limited ability to handle correlation across multiple data types.
Getting started with Grafana can be as easy as running a single Docker container and connecting to the Grafana Dashboard.
Directly from the source, "Prometheus is an open-source systems monitoring and alerting toolkit..."
That statement is accurate but does not really address the scale to which Prometheus has caught on as the defacto open-source tool for gathering metrics and generating basic alerts in today's cloud-based world. This is especially true if you are in the Kubernetes universe where it is an undisputed fact that Prometheus Data is King.
Prometheus metrics come from its own datastore that it uses to collect the time-series data it generates from the metrics it monitors. Prometheus also has an extensive series of plugins available that allow it to expose data to various external solutions, and to import data from any number of other data sources, including multiple public cloud-monitoring solutions. AWS even recommends Prometheus for its EKS (Kubernetes) offering, over its own CloudWatch service.
Prometheus monitoring has limitations that occur around data and metric management as it scales. If, for example, there is a need to scale down a Kubernetes instance down the road, the choices on how to scale will have a large impact on how that instance data is stored. Moreover, this can then make it difficult to aggregate this data to bring back a holistic view of Kubernetes monitoring. There are two primary approaches once the standard configuration hits its limits. Those are to have a series of worker nodes that shard the data to handle the volume, or segment Prometheus to have multiple independent instances. The segmented scenario requires extra tooling to get that holistic view back; Such as federating the instances through a "global" instance, where commercial solutions like Sumo Logic handle scalability behind the scenes.
As a side note, if the series of worker nodes deployment model is used to assist with scalability with its inherent deployment complexity, it also resolves an inherent data persistence issue where Prometheus prefers to use local storage. This preference for local storage means that if a node has a fatal crash, all the current and historic data on that node is lost for most Prometheus deployments.
Prometheus has basic visualization capabilities that can be used if you want to expose a small handful of metrics to see basic trending, but almost all organizations expose the data to a more powerful visualization suite.
Prometheus can also be run using a Docker container. It is not usually deployed as a standalone, however. It is most often used within a Kubernetes cluster and is deployed using a Helm chart or managed by an operator. Those two deployment methods take care of a lot of the complexities inherent in running in a Kubernetes cluster and let Prometheus stick to what it is best at which is exposing and gathering the metrics from pods in the cluster.
As much as we like to have a single solution to solve every problem, the more complex the infrastructure, the more complex the toolset to support it usually becomes. This is the case with Grafana and Prometheus.
Prometheus primarily focuses on metrics; not log data. It is great at exposing standard and custom metrics from an application it is monitoring. When it is deployed in a Kubernetes cluster it can discover any pod that is running and persist any time-series data the application has exposed to its data store. Grafana, on the other hand, cannot define what data is exposed and captured.
When Grafana has access to an aggregated data set, its visualization tool makes it relatively easy to see multiple metrics across multiple application stacks on the same screen, in a Grafana dashboard that you can save and refer back to often. A Prometheus dashboard can visualize individual metrics as graphs but does not have the same flexibility or extendability as Grafana. Prometheus even links to Grafana in its documentation around visualization, as it knows it has limitations.
Together they make a very powerful combination that covers data collection, basic alerting, and visualization.
Using Grafana with Prometheus is only a few clicks away: simply click "Add New'' under data sources in the Grafana console, and enter the connecting information for the Prometheus instance you want to access the data in.
Both Prometheus and Grafana are built around time-series data - with Prometheus primarily on the gathering side and Grafana on the reporting side. Both tools are open-source, are widely available with lots of community support, and are more than capable of meeting the needs of enterprises, large and small.
The two real caveats are the level of expertise required when building an open source tool that monitors logs and metrics data with other open source tools. It will be very much a DIY experience, including when leveraging any of Grafana's premade dashboards. The dashboards need more than basic product expertise to import successfully.
This is where observability software-as-a-service solutions really show their value. They improve the time-to-value by having premade dashboards readily available, and only a few clicks away - in addition to having a truly centralized data store that consolidates data from all parts of your infrastructure, not one data store per cluster.
The second caveat is the lack of complex alerting logic and even basic event tracking, which can both be accomplished by most commercial offerings. Even the team behind Grafana has a commercial offering with more capabilities around these features.
Learn more about scaling Prometheus here.
Reduce downtime and move from reactive to proactive monitoring.
Build, run, and secure modern applications and cloud infrastructures.
Start free trial