An APM Solution is Not Enough

As a long time APM guy, and for someone who has spent quite a bit of time building monitoring tools such as transaction tracers, memory analyzers, tools for analyzing garbage collection behavior etc. it pains me to say this, but sometimes an APM solution is not enough.

In fact, a recent blog I read by Andreas Grabner of Compuware entitled “Don‘t Trust Your Log Files: How and Why to Monitor ALL Exceptions” got me thinking on this topic. Grabner claims that you shouldn’t trust your log files because if you do not catch and log your exceptions you are not going to see them in your logs. I think it’s fair to say, and pretty obvious, that if you do not log an exception you are not going see it. However, on the flip side we are seeing more and more users (and we have over 25,000 of them) who are getting real value from logging for exactly this use case. In fact a recent survey we carried out points to production monitoring and production troubleshooting as the top two ways that our users are utilizing their logs!  And if you are smart about how you log, you do not need to have verbose logging turned on to catch key issues in your system. Logging can be quite simple, and deliver value that is much more specific and granular than basic APM.

Logentries log management, log data

Use your logs to ‘log all the things’ especially important exceptions

So my question is:  why are people turning to logs more and more for production monitoring and troubleshooting in addition to their expensive, existing monitoring tools? In short I believe that systems have fundamentally changed over the past few years. If you think back even five years ago systems were mainly on-premise or in a data center where you had complete control of your environment. However, today it is becoming more and more common for systems to be deployed entirely on the cloud or to at least make use of numerous cloud components. For cloud-based systems full instrumentation, or the use of APM solutions is often not an option since many parts of the stack may no longer be under your full control; the access required to apply an APM solution may not be available. For example, with IaaS you only have access from the operating system and up, i.e. the operating system, the middleware and application tier. The provider will control everything below the operating system such as the hypervisor layer, the hardware and the network. For those using PaaS, the situation is even more constrained since PaaS vendors tend to manage the OS and middleware components on behalf of their users. You, therefore, only have access to the application tier from an instrumentation perspective. Finally, with SaaS components, you generally do not have any ability to instrument and are required to rely on any instrumentation APIs or endpoints provided by the SaaS vendor.

As a result, it is common for traditional APM solutions, which have claimed in the past to gave 100% end-to-end visibility for on-premise systems, to only give a fraction of that visibility for cloud-based systems. It is difficult to instrument the cloud and, thus, alternative approaches are required to give visibility into cloud-based components which otherwise can become black boxes from a performance or system monitoring perspective.

While it can be difficult to instrument cloud-based components, in general they tend to produce log data streams or provide access to APIs that can be polled to generate data streams. These data streams can be analyzed to give visibility into your systems:

While instrumentation may not be possible, the existing log data and API data streams provided by cloud vendors can be analyzed by log management and analytics solutions to provide real- time KPI dashboards giving deep visibility into what are often otherwise perceived as black box components.

While APM solutions are definitely being widely used for a view into how your own application code is performing, in many cases APM alone is not enough to give an end to end perspective of your system. We are seeing a huge increase in users sending more and more performance metrics into their log data – giving them the ability to use “logs as data” along side their APM solution, providing some of the deep insights that they really need.

Would love to hear your thoughts. How can log management & analytics further complement a traditional APM solution?

 



    Share This Post


    Posted in Application Performance Monitoring, Cloud, DevOps, Logentries
    4 comments on “An APM Solution is Not Enough
    1. Hi Trevor

      Funny that you think I wrote that blog to say that we do not need log files or tools like yours that specialize on capturing and analyzing log files. That is not the case!
      The point that I wanted to make – and which has been seen with several companies I worked with – is that in many cases the information we need to identify and detect a problem doesnt make it to a log file. Therefore I wanted to “educate” the world out there that you should look at e.g: ALL Exceptions thrown and not only those that make it to the log file. We figured out a way to do this and we have solved many problems with that approach.
      I do agree with you that logging EVERYTHING also doesnt make sense – then you have 99%of data to throw away – and – you may even impact the performance of your app if you log just too much. Here is a great example how logging kills your app servers: http://apmblog.compuware.com/2013/09/24/100-performance-overhead-by-websphere-activity-log-when-dev-is-not-aware-of-settings-in-production/.

      With all that said. I really like the fact that there are tools out there like yours that do a great job in helping people to find problems that are documented in log files. I totally understand that they are necessary and I am happy that both our tools co-exist in these environments as we both bring great value to those folks that use them.

      You also brought some good points on environments where “traditional” APM tools dont get to capture these things as you can’t for instance install agents on SFDC.COM, Googles AppEngine, … – but – the more modern APM Tools have ways to capture information, extract it or integrate with tools like yours to combine forces and provide a combined toolset to solve the problems of the end users.

      In the end its not about: “Log Files vs. APM Data” – its about helping our customers to build better software faster. I think both our companies (and others in our space) should have that as the primary objective

      Cheers
      Andi

      • Trevor Parsons says:

        Hi Andi,

        Thanks for the detailed comments above. I totally agree it is not about ‘Logs vs APM’ and in most cases these are complementary tools. Although we are seeing a shift where more and more people are starting to send more and more application performance metrics into their log streams for real time processing so that they can easily cross correlate their performance metrics with system log events as per the comments below by GP.

        I think you are right also if you log a very large amount of data it is going to impact performance, and i’d always recommend having a well thought out logging policy so that you are logging and capturing relevant data for your use case(s). However, the same can be said if you have an APM solution that it capturing data at too fine grained a level i.e. it is going to impact the application performance.

        In fact, this is often why people will dynamically turn on deep profiling only when there is a requirement to do so rather than having deep profiling running in production. I have done some analysis on profiling/APM tools in the past also, that show, especially when a system is under stress and when it is heavily saturated a noticeable increase in the instrumentation overhead can be observed (see here: http://performance.ucd.ie/tparsons/files/IEEE-TSE-Extracting-Component-Interactions-preprint.pdf).

        To re-iterate, it’s certainly not Logs vs. APM, however finding ways in which logging technologies and APM solutions can better complement each other to allow developers, devops, product guys etc. better understand what is happening in their systems is key.

        Thanks again for the detailed and informative comments!

        Trevor

    2. GP says:

      Trevor – you make an interesting point which I agree with based on operating large complex and heterogeneous environments. Most organizations tend to use logs for trending or compliance reasons, but if the data or event that is captured within the log is streamed, analyzed in real-time – it offers a powerful application monitoring solution that is non-intrusive.

      You might like the OpenOpsIQ blog on this topic as well…

      http://openopsiq.com/2014/04/03/apm-logging-monitoring-three-legs-of-a-stool-or-redundant-tools-waiting-to-be-consolidated/

    3. Trevor Parsons says:

      Thanks for the link – nice collection of blog posts. I agree with the post you linked above, real time event processing has the potential to change the landscape somewhat and provides an opportunity to consolidate solutions that take advantage of a single data stream containing data from all parts of the stack.

      On a side note I also stumbled across another nice post on your blog, on real time logging: http://openopsiq.com/2014/01/07/does-real-time-log-data-collection-and-analysis-monitoring/

      I also put something together on this topic recently that might be of interest: https://blog.logentries.com/2014/04/real-time-log-management/

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    *

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <p>

     

    Subscribe to the Blog