longpelaexpertise.com.au/ezine/MonitorsFiveMinute.php?ezinemode=printfriend

LongEx Mainframe Quarterly - August 2015

management: A Five-Minute Guide to Monitors

We all have them. A monitor like TMON for CICS to monitor our CICS. Or Omegamon XE for IMS. And they're not cheap.

Aside from the cost of the software itself, they use CPU, and consume man-hours to commission and maintain. But what are all these monitors, and why do we have them? In this article, we give the essential five-minute guide to mainframe monitors.

The Basic Categories

There are a lot of monitors on the market, with different purposes and features. However they can be divided into four basic categories:

Operational
Analysis
Centralised
Niche

Let's look at these categories.

Operational Monitors

Operational monitors are exactly that - software designed to 'keep the trains running'. They look at the system or subsystem they're monitoring, and report on the current health as it stands now. Let's look at operational CICS monitors. These provide information on the CICS region they are monitoring like:

Current workloads - current transactions running. Information like how long they've been running, or how many file, DB2, temporary storage and transient data accesses they've made is shown. They will also show statistics such as CPU consumed, DSA (memory) used, and more.
Resource status - resource usage for each transaction. A summary for the entire region is also shown: total memory usage, total CPU usage, and other metrics such as file I/O rates, DB2 activity etc.
Problems - A screen that highlight problems. For example, a message may displayed if the CICS region hits its maximum task limit, or if transactions regularly run for longer than a set limit. Other possible problems include files or connections that are closed, program abends/dumps, excessive storage use, or excessive file I/Os.
Performance - performance statistics for each transaction. For example the service time of a transaction broken down into journal, DB2, Websphere MQ, file access and more. Similar information broken down by resource is also shown. For example, I/O performance by file, or LSR performance by LSR pool.
General info -allow administrators to view resources values, and sometimes even change them. So a CICS Systems Programmer can see a program's definition, or system configuration values in use.

Most operational monitors throw in a few handy tools and gadgets. For example tracing features, screens to issue system commands, or tools to do things not possible with system commands. Some also include historical data recording and processing.

But perhaps the biggest strength of operational monitors is how they present this information. They all have screens that generally present information in a clear and easy-to-digest format. Screens are easy to navigate, normally with drill-down menus to allow problems to be identified and analysed fast.

Many monitors also include facilities to notify operators and administrators of any problems.

Operational monitors include IBM Omegamon XE, CA SYSVIEW, ASG TMON, and BMC MAINVIEW. They come in flavours for z/OS, CICS, IMS, TCP/IP, Websphere MQ, DB2 and more.

Analysis Monitors

These monitors are used for in-depth analysis. There's a lot of different types, so let's look at a couple of examples.

Sampling Monitors

Sampling monitors take a lot of snapshots of an address space as it runs - usually around 1000 per minute. From these snapshots, or samples, they display what's happening over the sampling period. For example, they will show programs that consumed CPU (usually from highest to lowest), with options to further break this down to the location within each program. From this we can identify the programs causing high CPU usage, and target them for tuning. Similar analysis is provided for wait times, so service/elapsed times can be analysed.

Because sampling monitors take many samples, they have a record of this information over time. So they can show how waits, I/O or CPU usage change as a program runs from minute to minute. These monitors also include additional options to show information on things like DB2 calls, CICS transactions, MQ performance, file performance and more.

Our article Monitoring by Sampling looks at these in more detail.

SQL Monitors

There are a few products that analyse DB2 SQL statement performance. These trace SQL calls as they are made, and measure their performance. So they provide information on poorly performing SQL statements, and the corresponding SQL source. These are great when working on DB2 application performance. BMC AppTune, CA Detector, and IBM Query Monitor are examples.

Application Performance Monitors

Some monitors look more closely at application performance with tracing or monitoring facilities. For example ASG TriTune provides much of the SQL monitoring above, and adds VSAM, IMS and Websphere MQ monitoring. Compuware PurePath solutions provide monitoring for CICS and Java applications.

Centralized

The third category of monitors provide a centralized information point. Larger sites will have several z/OS regions, DB2 subsystems, Websphere MQ queue managers, and more. It's not uncommon to see a site with 100 separate CICS regions. Managing all of these can be very difficult. Enter the Centralized monitor.

Once example is IBMs Tivoli Enterprise Portal (TEP). This is a centralized GUI for managing different IBM monitors, both on and off the mainframe. So an administrator from one PC screen can see information from different IBM Omegamon monitors on different systems - as well as other IBM monitors on non-mainframe systems.

ConicIT also provides a centralised view. However it can take information from several monitoring products from different vendors, including Omegamon XE and BMC MAINVIEW. Like IBM TEP, screens can be tailored as needed.

Some monitors are simply availability checkers. They regularly confirm that a system or subsystem is available, and can also provide response time information. These monitors usually include historical information. Most are not mainframe based, though Inside Products Availability Checker is an exception.

Monitors such as Nastel AutoPilot M6 TransactionWorks, BMC Application Transaction Tracing and IBM ITCAM for Transaction Tracking tackle a different problem. These display routing and performance information for composite applications crossing system and subsystem boundaries. For example a J2EE application on a UNIX system that accesses DB2 tables on z/OS. We talk more about composite applications in our articles The Challenges Monitoring Composite Applications and Mainframe Products for Monitoring Composite Applications

Niche Monitors

There are many special purpose monitors that provide performance and other information for very specific areas not covered by normal operational monitors. For example, ARCH Consulting markets BATSTAT for CA-IDMS batch performance. Similarly CA-TSOMON provides detailed performance information for TSO, and IBM Financial Transaction Manager monitors financial transactions routed through Websphere Message Broker.

Summary

No site will have every possible monitor. All will have a basic operational monitor for CICS, DB2 and IMS. Few will run without an operational z/OS monitor, and most will also have a sampling tool like Compuware Strobe. Other monitors are bought by companies as they are needed.

It's not unusual for a site to acquire a monitor to solve an immediate need, and then maintain that monitor even after the need has gone. For example, many sites are reviewing their SNA network monitors as SNA networks are replaced with TCP/IP. However new problems are encouraging new monitors such middleware monitors for Websphere MQ and SOA environments, or Java and Websphere Application Server monitors.

Monitors can be an invaluable aid in improving performance and ensuring the continued health of systems and applications. However poorly-customized or maintained monitors can be a drain on finances and system resources, without providing the value that they are capable of.

David Stephens