Pricing Login
Pricing
Support
Demo
Interactive demos

Click through interactive platform demos now.

Live demo, real expert

Schedule a platform demo with a Sumo Logic expert.

Start free trial
Back to blog results

July 28, 2021 By Sumo Logic

How to monitor Cassandra database clusters

Apache Cassandra is an open-source distributed NoSQL database management system that was released by Facebook almost 12 years ago. It’s designed to handle vast amounts of data, with high availability and no single point of failure. It is a wide-column store, meaning that it organizes related facts into columns. Columns are grouped into “column families.” The benefit is that you can manage data that just won’t fit on one computer.

In this article we look at how to monitor Cassandra database clusters. We start with the basic architecture of a Cassandra cluster, and mention the most important metrics to gather. Next, we advance step-by-step into configuring and setting up a monitoring stack with Jolokia, Telegraf and Sumo Logic collectors and dashboards – everything you need to monitor Cassandra databases.

Cassandra Architecture

Apache Cassandra is designed to work as a massively scalable, available and reliable data store. To achieve those goals it operates in a distributed cluster environment where you install and join groups of Cassandra instances together. A single cassandra instance is called a node and you can install it via Docker or in a single machine. If you add more nodes and pair them together, then you form a cluster.

Cassandra works with peer-to-peer architecture, and each node connects to all other nodes without requiring a master node. To join an existing cluster, you need to provide a list of peers.

When you connect many nodes together, you can further categorize them into racks and data centers. A rack corresponds to a subset of nodes that reside in the similar physical region and share resources like a network switch, power supply, etc. Even if you don’t have physical infrastructure, you can still place nodes in racks when they are co-located. A data center represents a group of either geographically distinct nodes or have different workloads.

This topology makes a Cassandra cluster resilient to single points of failure.

Cassandra Cluster topology
Cassandra Cluster topology

In the above illustration we have two datacenters DC1 and DC2 with 7 nodes and two racks. Although the above configuration looks simple, there is much plumbing required to make it work. Let’s take a look next at the most important metrics when monitoring Cassandra clusters.

Cassandra Performance Metrics to Consider

Cassandra is a Java application, so inevitably you will have to understand how to monitor JVM-based applications. The main categories of metrics you need to consider are:

  • JVM Metrics: These are metrics related to the JVM execution environment on which Apache Cassandra is running.

  • OS Metrics: These are related to the Operating system and the hardware running the services.

  • Cassandra Metrics: These describe how the system and its parts perform and are related to the database application.

The complete list of Cassandra Metrics is referenced on the official page. Those metrics require some degree of expertise in the system and their impact in performance or behavior. Hopefully with third-party monitoring tools like Sumo Logic you can get the most important ones presented under one dashboard. Let’s get started on how to monitor Cassandra clusters.

Get Started with Sumo Logic Monitoring for Cassandra Clusters

Here are the steps to configure monitoring for Cassandra database clusters. Although the required steps are rather simplified, there is further customization you can perform based on each use case. You will have to note that those metrics are collected from a single node; ideally, you want to aggregate them via external monitoring.

The general steps to monitor Cassandra clusters with Sumo Logic are outlined below:

  1. Install and configure the Jolokia agent on each node.

  2. Install and configure Telegraf collector on each node.

  3. Configure a Hosted Collector in Sumo Logic.

  4. Configure Sumo Logic output plugin with Telegraf and start collecting metrics.

  5. Install the Cassandra App to view the logs in Sumo Logic.

Let's get started with describing the steps, one by one:

Step 1: Install and configure the Jolokia agent on each node

By default, Cassandra metrics are managed using the Dropwizard Metrics Library. This creates a registry of metrics like counters and gauges, and reports them via JMX. In order to consume these JMX metrics we need to install an agent that exposes them to collectors. Using Jolokia is a convenient solution as it acts as a JMX-HTTP bridge, so you can view them in your browser.

Cassandra distributions do not use Jolokia by default, so you need to install and configure it when you start a new Cassandra node.

A description of a Dockerfile to perform this assembly:

Dockerfile

FROM bitnami/cassandra:latest

ENV JOLOKIA_VERSION=1.6.2

ENV JOLOKIA_HOST=0.0.0.0

ENV CASSANDRA_LIB=/opt/bitnami/cassandra/lib

USER root

WORKDIR $CASSANDRA_LIB

RUN curl -L https://search.maven.org/remot...;{JOLOKIA_VERSION}/jolokia-jvm-${JOLOKIA_VERSION}-agent.jar > jolokia-jvm-${JOLOKIA_VERSION}-agent.jar

ENV JVM_OPTS="$JVM_OPTS -javaagent:/opt/bitnami/cassandra/lib/jolokia-jvm-${JOLOKIA_VERSION}-agent.jar=port=8778,host=${JOLOKIA_HOST}"

USER 1001

EXPOSE 7000 9042 8778

ENTRYPOINT [ "/opt/bitnami/scripts/cassandra/entrypoint.sh" ]

CMD [ "/opt/bitnami/scripts/cassandra/run.sh" ]

We are using the latest bitnami/cassandra image and add a couple of steps before running the default entrypoint. Download the latest Jolokia agent and add the agent configuration in the JVM_OPTS environmental variable.

If you create a container from this image you can verify that it works correctly:

$ docker run --name cassandra -p 8778:8778 -p 7000:7000 -p cassandra

Once the node is ready, head up to localhost:8778/jolokia endpoint and review, for example, the memory metrics:

Cassandra memory metrics

Those metrics run for each node, so you can use the same docker image on any other node and it will expose those metrics via HTTP.

Step 2: Install and configure Telegraf collector on each node

The JMX metrics are currently exposed on each cassandra node, but we need to aggregate them and send them to the Sumo Logic endpoint. To achieve that operation we can use Telegraf, which is a plugin-driven server agent for collecting & reporting metrics.

First, install the Telegraf binary from the downloads page and verify that it works:

❯ telegraf --version

Telegraf 1.19.0

Telegraf collection works by accepting a configuration file that declares the input sources that represent the metric counters, and the output destinations that represent the monitoring sinks or destinations.

In our example we use the default telegraf.conf for a cassandra node located here:

Download the configuration first and edit it so that you specify the connection parameters for the jolokia agent:

telegraf.conf

[[inputs.jolokia2_agent]]

urls = ["http://0.0.0.0:8778/jolokia"]

name_prefix = "java_"

[inputs.jolokia2_agent.tags]

environment="prod"

component="database"

db_system="cassandra"

db_cluster="cassandra_on_premise"

dc = "IDC1"

[[inputs.jolokia2_agent.metric]]

name = "Memory"

mbean = "java.lang:type=Memory"

[[inputs.jolokia2_agent.metric]]

name = "GarbageCollector"

mbean = "java.lang:name=*,type=GarbageCollector"

tag_keys = ["name"]

field_prefix = "$1_"

[[inputs.jolokia2_agent]]

urls = ["http://0.0.0.0:8778/jolokia"]

name_prefix = "cassandra_"

[inputs.jolokia2_agent.tags]

environment="prod"

component="database"

db_system="cassandra"

db_cluster="cassandra_on_premise"

dc = "IDC1"

[[outputs.sumologic]]

url = "<URL>"

data_format = "prometheus"

[outputs.sumologic.tagpass]

db_cluster=["cassandra_on_premise"]

We have highlighted some sections in the config that you will need to specify the correct endpoint. The jolokia2_agent URLs should point to the local jolokia agents. If you have more than one endpoint, you should provide them as well.

The last configuration is for the output URL of the Sumo Logic hosted collector. Follow the next steps to generate and retrieve that endpoint.

Step 3: Configure a hosted collector in Sumo Logic

In the previous step we declared a Telegraf configuration for collecting metrics from a Cassandra node; but we needed a URL for sending the outputs to Sumo Logic. To perform that you need to configure a hosted collector.

Follow the steps as outlined in this tutorial. When successful you should be able to see the hosted collector in the sumologic.com collection page:

Sumo Logic Hosted Collector

A hosted collector does not collect metrics by default. You will need to define a source for this collector that will gather all the monitoring metrics under this namespace. Once done, you will be able to send them to Sumo Logic. We will check this step next.

Step 4: Configure Sumo Logic output plugin with Telegraf and start collecting metrics

Once you have a hosted collector created, add a Source for that collector. Follow the steps as outlined in this tutorial. By the end of this process you should be able to access a unique URL for the source collection. Use that URL in the telegraf.conf setup we defined earlier:

[[outputs.sumologic]]

url = "https://endpoint1.collection.e...;BASE_64>"

data_format = "prometheus"

[outputs.sumologic.tagpass]

db_cluster=["cassandra_on_premise"]

Now you can start the Telegraf agent.

< p>> telegraf --debug --config telegraf.conf

You should be able to see collection logs for the Sumo Logic output:

...

2021-07-05T14:28:58Z D! [agent] Connecting outputs

2021-07-05T14:28:58Z D! [agent] Attempting connection to [outputs.sumologic]

2021-07-05T14:28:58Z D! [agent] Successfully connected to outputs.sumologic

2021-07-05T14:28:58Z D! [agent] Starting service inputs

2021-07-05T14:29:02Z D! [outputs.sumologic] Wrote batch of 1000 metrics in 1.635236207s

2021-07-05T14:29:02Z D! [outputs.sumologic] Buffer fullness: 1033 / 10000 metrics

2021-07-05T14:29:02Z D! [outputs.sumologic] Wrote batch of 1000 metrics in 445.319418ms

2021-07-05T14:29:02Z D! [outputs.sumologic] Buffer fullness: 33 / 10000 metrics

Then you can check as well that the Cassandra agent has been registered in the collection/status page:

Sumo Logic Cassandra Collection Status Page

Step 5: Install the Cassandra App to view the logs in Sumo Logic

Currently, we collect the metrics and logs, but it would be even better if we had a dedicated dashboard for Cassandra-related searches and event monitoring. Sumo Logic comes bundled with this feature by using their App Catalog. Follow the steps outlined in this tutorial to set it up. Once completed, you can access the dashboards from the Library-> Personal -> Cassandra folder.

Next Steps with Cassandra Cluster Monitoring

This article provided a brief introduction to Cassandra architecture and its basic monitoring Metrics. It included a step-by-step tutorial for setting up a Sumo Logic dashboard for monitoring Cassandra logs and metrics.

The official Sumo Logic documentation site is an excellent reference site for troubleshooting and understanding the required steps to configure different collectors and query the logs using the dashboard. Feel free to add more Cassandra nodes and observe how the dashboard captures and conveys this information to understand their functionality.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Sumo Logic cloud-native SaaS analytics

Build, run, and secure modern applications and cloud infrastructures.

Start free trial

Sumo Logic

More posts by Sumo Logic.

People who read this also enjoyed