The humble log has largely been thought of as a data source that is there “just in case” there is a problem: a record of what has happened, that provides the evidence needed, when you are required to investigate a system issue or security event, for example. Generally logs have only ever been accessible by the chosen few, who sit in the engine room of the system looking at at log events flying past all day long – think of that scene in the Keanu Reeves movie “The Matrix,” where the character Cypher is translating the matrix in real time as events fly by:
Nowadays, looking at raw log events in real time may not be of much use to even the experts, as even small systems can generate 1,000’s of events per second (or 10’s of millions of events per day). Thus, log management technologies have to date largely focused on helping these “engine room experts” to more efficiently dig into this data via search and filtering capabilities.
However, there is a wealth of information contained in your log data which is useful for use cases beyond debugging or security. In fact, we recently carried out a survey across a sample of ~200 users and asked them what their top use cases for logs were. Interestingly, the traditional use cases such as debugging, security and compliance did feature (although security and compliance were pretty low on the list), but so did a whole range of other use cases, such as production monitoring, web analytics, support, real user monitoring, business analytics.
Survey Question: what are your top use cases for log management and analytics?
“How can logs be used for these things?” you ask? In short, log management technologies can now take your unstructured data and pick out the important pieces of information (e.g. response times, customer plan dollar values, performance resource usage info…), then roll these up into metrics dashboards that give you a view into a wide range of trends that are happening across your system and business.
The beauty with log data, however, is that it always maintains the evidence. So, if you suddenly see a spike in the number of signups on daily basis for example, you can quickly validate the change by checking your logs to see exactly who signed up and when. This is not always possible with your traditional metrics dashboards, as they do not maintain the source of data that is used to calculate the metrics. In this case validation of a sudden change in a metric often requires a discussion with your engineers who are usually already pretty busy delivering on your roadmap (read as: can take days to validate).
Here’s a quick example of how I can take a stream of CloudWatch data from my AWS account and create a view across CPU Usage, NetworkIn and NetworkOut data and then dive back into the logs to investigate any events that lead to spikes in the data. Note, Logentries can plug directly into CloudWatch and stream this data into your Logentries account such that you can build performance dashboards and correlate these with your log data from your AWS instances.
Step 1: Viewing my CloudWatch logs
The screen below shows my AWS CloudWatch data streaming into my Logentries account. CloudWatch captures a long list of performance metrics from your different AWS services. The screen below shows CPU Utilization information for a particular server instance ( i-589a7012) over the past 24 hours. CloudWatch provides the min, max and average values over defined periods (e.g. ever 1 minute). To get a quick view of CPU load I can easily scroll down through individual events or can search for particular events where the load value was bigger than a particular threshold.
Step 2: Rolling them up into a dashboard
Rather than look at individual events (like Cypher from “The Matrix”) I might be better served looking at a higher level representation of the data. To do so I can use one of the logentries search functions (e.g. AVERAGE) to roll up the average CPU load into a dashboard. For example the screen shot below shows the average CPU load for i-589a7012 over the past 24hours.
I can easily do a similar roll up for NetworkIn or NetworkOut for that matter and compare and contrast these. For example:
Step 3: Investigating the evidence
If I want to investigate the spike in CPU at 03:47, for example, I can simply dive back into the logs and navigate to that timeframe (by either scrolling or clicking on the log visualization graph – which will take me to that exact point in the logs) to see the actual log events which resulted in the CPU load spike.
At this point I may want to investigate my application or system logs to check whether there was activity in the system which led to the CPU spike.
Outside of performance monitoring you can also use your logs for a range of activities as outlined by the answers in our user survey above. Keep an eye on our blog for how-to’s on all these areas over the next while, but for now here’s a short list of examples:
- Product Usage Tracking: We believe in dogfooding our own solution here at Logentries – so we use our own JS plugin to track user actions within Logentries. This allows us to view the most popular features and what customers use them on a daily basis.
- Mobile Analytics: Use android, iphone, windows mobile or html5 libraries for exception tracking, real user monitoring or for mobile app analytics.
- Business Analytics: Some of things we track internally with Logentries include: the number of signups per day, number of upgrades, number of new paid plans, any downgrades/account deletions (we’ll usually want to get a real time alert on these….), any exceptions that prevent signup/upgrades, etc.
I think it’s fair to say that the humble log is no longer a resource for the chosen few – at Logentries our aim is to make log data accessible to all and for a wide range of use cases – and as you can from our user survey, it seems to be working… 🙂 Let us know if you are doing anything exciting with yours!