Longpela Expertise logo
Longpela Expertise Consulting
Longpela Expertise
Home | Press Room | Contact Us | Site Map

LongEx Mainframe Quarterly - August 2014

opinion: Stop Firefighting and Start Tuning

Recently I was at a site with an over-worked performance team. This isn't unusual, as performance is a large, time-consuming task. This small group was responsible for many z/OS systems in different Parallel Sysplexes. They not only were responsible for z/OS systems performance, but were involved in any performance issues: batch overruns, CICS response times, Websphere MQ delivery times. However the big problem was that they spent all their time dealing with problems: they were fire-fighting, not tuning.

Let's take an example. An application team rings up: "Our Websphere MQ performance between 10:00 and 10:30 this morning was 1.5 seconds. This is more than the 1 second specified in the Service Level Agreement. Tell us why and fix it!" So let's see what our performance group needs to do now:

  1. They need to confirm what the application team has said. So they must go into the monitoring tools they have and take a look at the response times. Hopefully, they're familiar with the SLA, and can quickly confirm that the response time is too high
  2. They must confirm that this is unusual, by looking at response times at similar times in the past. If the Websphere MQ response time is always higher than the SLA, then there should already be a project working on this
  3. If they confirm there's a problem, then they need to find out why. So they'll look at the z/OS and related performance during that period to find out what happened. They'll also look to see what, if anything has changed

All this is a lot of work. That 10 second phone call has taken out a few hours of a performance staff member's day. If this happens regularly, then that is all our performance team will be able to do. Full time fire-fighting

A much better approach would be to monitor performance. So let's look at the perfect scenario:

Our performance team has setup automated monitoring systems. Performance tools have been configured with SLAs and expected performance, so screens quickly show when things are outside of normal. Automated notifications (like emails) are sent to performance staff when something doesn't perform as it should.

Daily batch jobs analyse SMF records, and produce performance reports that are archived. Our performance team can quickly look at the past performance of critical systems. Trends can be seen, and potential issues addressed before they become problems.

If all this were the case, then our scenario above would be a little different:

  1. Performance team is notified by automated systems that Websphere MQ performance between 10:00 and 10:30 wasn't sufficient
  2. Automated systems also notify the performance team that a CICS transaction is looping at the same time
  3. By the time the application teams rings, the performance team has already confirmed that the looping CICS transaction consumed excessive CPU, starving Websphere MQ. The problem transaction was terminated, and the relevant application team notified

Our perfect approach took less than 30 minutes, and our performance team were fixing the problem before it was reported.

The problem is that I rarely see our perfect scenario. Simply put: performance isn't seen as important by management until there's a problem.

Setting up the automated procedures and configuring monitoring tools takes time, and is an ongoing process as things change. Many performance groups are too busy fire-fighting to setup and maintain this monitoring.

Performance monitoring and management is long-term. An investment to create and maintain infrastructure for effective, ongoing, automated monitoring will pay off again, and again, and again.

David Stephens

LongEx Quarterly is a quarterly eZine produced by Longpela Expertise. It provides Mainframe articles for management and technical experts. It is published every November, February, May and August.

The opinions in this article are solely those of the author, and do not necessarily represent the opinions of any other person or organisation. All trademarks, trade names, service marks and logos referenced in these articles belong to their respective companies.

Although Longpela Expertise may be paid by organisations reprinting our articles, all articles are independent. Longpela Expertise has not been paid money by any vendor or company to write any articles appearing in our e-zine.

Inside This Month

Printer Friendly Version

Read Previous Articles

Longpela Expertise can improve your system performance. We can determine performance problems, and implement performance solutions to speed up your systems. Contact us to get your own z/OS performance expert.
© Copyright 2014 Longpela Expertise  |  ABN 55 072 652 147
Legal Disclaimer | Privacy Policy Australia
Website Design: Hecate Jay