Evaluate your SIEM
Get the guideComplete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.
April 20, 2017
In Part I there was a general overview of custom logs and Part II discussed timestamps and log content. At this point, you have a log that contains a bunch of important data to help you analyze it to gather useful information about your systems. In this final part of this series, you’ll learn about how to organize the data in your logs and how to make sure you properly document it.
You may have the most descriptive and helpful data in your logs, but it can be very difficult to analyze your logs if you don’t have a defined and structured syntax. There are generally 2 ways to go about structuring your logs.
When it comes to log analysis and parsing your logs, a key-value pair may be the simplest and allow for the most readable format. In our previous example, it may not be the most human-readable format and it may be a little more difficult to find anchors to parse against.You can change the message to be easier to read by humans and easier to parse in a tool like Sumo Logic:
timestamp: 2017-04-10 09:50:32 -0700, username: dan12345, source_ip: 10.0.24.123, method: GET, resource: /checkout/flights/, gateway: credit.payments.io, audit: Success, flights_purchased: 2, value: 241.98
You can take it a step further and structure your logs in a JSON format:
{ timestamp: 2017-04-10 09:50:32 -0700,username: dan12345,source_ip: 10.0.24.123,method: GET,resource: /checkout/flights/,gateway: credit.payments.io,audit: Success,flights_purchased: 2,value: 241.98,}
In Sumo Logic, you have various ways to parse through this type of structure including a basic Parse operator on predictable patterns or even Parse JSON. While it is ideal to use some sort of key-value pairing, it is not always the most efficient as you’re potentially doubling the size of an entry that gets sent and ingested. If you have low log volume, this wouldn’t be an issue; however, if you are generating logs at a high rate, it can become very costly to have log entries of that size.This brings us to the other format, which are delimited logs.
Delimited logs are essentially the type of log you built in the previous examples. This means that it’s a set structure to your log format, and different content is broken up by some sort of delimiter.
2017-04-10 09:50:32 -0700 dan12345 10.0.24.123 GET /checkout/flights/ credit.payments.io Success 2 241.98
Because of how this example is structured, spaces are the delimiters. To an extent, this is perfectly reasonable. The problem this provides you when parsing is figuring out where fields start and end as you see with the timestamp, though it may be the most efficient and smallest size you can get for this log. If you need to stick with this format, you’ll probably be sticking to regular expressions to parse your logs. This isn’t a problem to some, but others regular expressions can understandably be a challenge.To try and reduce the need for regular expressions, you’ll want to use a unique delimiter. A space can sometimes be one, but it may require us to excessively parse the timestamp. You may want to use a delimiter such as dash, semicolon, comma, or another character (or character pattern) that you can guarantee will never be used in the data of your fields.
2017-04-10 09:50:32 -0700 - dan12345 - 10.0.24.123 - GET - /checkout/flights/ - credit.payments.io - Success - 2 - 241.98
A syntax like this will allow you to parse out the entire message with a space-dash-space ( - ) as your delimiter of the fields. Space-dash-space would make sure that the dashes in the timestamp are not counted as a delimiter.Finally, to make sure you don’t have an entry that can be improperly parsed, always make sure you have some sort of filler in place of any fields that may not have data.For example:
2017-04-10 09:50:32 -0700 - dan12345 - 10.0.24.123 - GET - /checkout/flights/ - credit.payments.io - Failure - x - x
Furthermore from the example, you know that the event was a failure. Because it failed, it didn’t have flight totals or values. To prevent needing additional parsers for not having those fields, you simply can replace those fields with something like an ‘x’. Note that if you’re running aggregates or math against a field that may typically be a number, you may require adding some additional logic to your search queries.
You may have the greatest log structure possible, but without proper documentation it’s possible to forget why something was part of your logging structure or you may forget what certain fields represented. You should always have documented what your log syntax represents.Referring back to the previous log example:
2017-04-10 09:50:32 -0700 - dan12345 - 10.0.24.123 - GET - /checkout/flights/ - credit.payments.io - Success - 2 - 241.98
You can document your log syntax as such:
Timestamp - username - user_ip - method - resource - gateway - audit - flights_purchased - value
This log syntax can placed at the very start of the log file one time for future reference if necessary.
At Sumo Logic, we regularly work with those who are new to logging and have many questions around how to get the most out of their logs. While you can start ingesting your logs and getting insights almost immediately, the information provided from the tool is only as good as the data we receive. Though most vendors do a good job in sticking to standard log structures with great data to get these insights, it’s up to you to standardize a custom created log.In this series, I set out to help you create logs that have relevant data to know as much as you can about your custom applications. As long as you stick to the “5 W’s”, you structure your logs in a standard syntax, and you document it, then you’ll be on the right track to getting the most out of Sumo Logic.Be sure to sign up for a free trial of Sumo Logic to see what you can do with your logs!
Reduce downtime and move from reactive to proactive monitoring.
Build, run, and secure modern applications and cloud infrastructures.
Start free trial