opinion: How Much Monitoring Do You Need?
Monitors are wonderful tools. They provide all sorts of functionality to trace, sample and report. The problem is that monitors don’t come for free. And I’m not talking about the cost of the software. The more you monitor, the more CPU you consume. They’ll also chew up memory, and add disk load.
Whenever I’m on a CPU reduction project, monitors are one of the first things I look at. And the pickings can be good. Some monitor features can be enormously expensive when switched on. And this cost may not be obvious. For example, the CPU overhead of some DB2 monitors is recorded against each address space attaching to DB2. Looking at the DB2 address space CPU usage won’t help.
I also look for duplicate monitoring. Do you really need to record TMON for CICS statistics to SMF when you’re recording normal CICS SMF 110 records? Probably not.
But there is of course a flip side. Systems programmers, DBAs and application programmers still need some monitoring. Some of these can be switched on for a short period of time (like IBM APA). Other need to be continuously running to gain any benefit.
So when I’m on a tuning project improving response times, I’m sometimes looking to get tracing turned on. It can be a tricky line.
Which begs the question: how much monitoring do you really need? The answer is of course a frustrating “it depends”. But here are some guidelines.
The first guideline is to estimate the CPU overhead. If it doesn’t cost much, then there’s no harm in turning it on. Sounds pretty basic, however the real CPU overhead can be difficult to estimate. Sometimes vendors will publish figures, but often they won’t. In this case looking at the CPU overhead of the monitor address space may help. Some monitors such as CA SYSVIEW also provide screens estimating their own overhead which is nice.
The second guideline is to determine what you really need. OK, this sounds trivial, but it isn’t. If you will probably never need a feature, then switch it off. I’ve seen many sites with features enabled that they will never need just in case someone else wants it in the future. Vendors will never need monitor features – they’ll use the basic operating system/CICS/IMS features to get their information. If they do need extra information, they’ll ask you to enable tracing or something else for short periods of time.
The third guideline is to avoid tracing. Tracing is the largest CPU consumer by far, and is almost never needed on an ongoing basis. If you need tracing, turn it on for a short period, then turn it off. DB2 and Websphere MQ have some traces you will need for accounting or basic performance monitoring. However in general, monitor tracing is never required to be permanently enabled. Perhaps you could switch it on for 10 minutes outside of the highest peak times to get a ‘feel’ of what is happening. Or perhaps automation can automatically enable tracing when a problem or pre-defined set of conditions is met. But never keep it on all the time.
The final guideline is to minimise historical recording. As a rough rule, I never recommend historical recording for monitors – usually the basic SMF and similar statistics will give you what you need.
When all is said and done, monitors are a wonderful tool that can make problem diagnosis easier. Sensible configuration and use will maximise their benefit, while minimising their overhead.