You know how they get the bad guys on CSI. We have seen the “police procedural” drama played out so often, it’s almost second nature:
- Listen to what they say they did.
- Examine the physical evidence to see what they actually did.
- Compare and confront (the fun part). It’s the discrepancies that drive the drama.
This is a powerful metaphor for troubleshooting your application stack in the cloud (bear with me… it’s worth it).
Let’s say you are seeing an issue with your cloud-based app. You have metrics from CloudTrail, CloudWatch—or maybe from New Relic or another performance monitoring system. They are telling you a “crime” has occurred… an SLA has been violated or a KPI threshold has been exceeded. Or even worse, a customer has suffered and wants you to suffer too. What do you do to find the culprits and bring them to justice?
- Look at the logs ( = what your app servers, web servers, load balancers, database servers, etc. say they did), using an AWS-based log analysis solution like Sumo Logic.
- Look at the wire data ( = how your app servers, web servers, load balancers, database servers, etc. behaved – what they actually did), using a deep packet sniffing solution like Extra Hop.
- Compare and correlate.
This model makes sense. But where is the “wire” (the data, not the TV series) when you are deployed in AWS?
It turns out that ExtraHop offers a “virtual tap” for AWS instances, i.e. a bit of software that emulates a network tap, allowing ExtraHop to collect packet data for real-time inspection and analysis.
It further turns out that ExtraHop provides a direct, out-of-the-box integration with Sumo Logic, so that data from ALL your AWS-based sources can be analyzed in real- time. This means you get wire data (the behavior) alongside the log/audit data (the testimony) from AWS and your running instances.
You can search, aggregate and correlate AWS events from CloudTrail, CloudWatch, and ELB together with your own logs (app, web, database), and with the wire-derived events from ExtraHop.
The result? Dramatically reduced Mean Time To Identify (MTTI) the root cause, which means faster remediation, higher availability, and better performance.