longpelaexpertise.com.au/ezine/HowIUseRMFIII.php?ezinemode=printfriend

LongEx Mainframe Quarterly - November 2022

technical: My RMF Monitor III Top Five

OK, let's get this out of the way. I love RMF Monitor III. Really do. It's just brilliant. It can do amazing things. And best of all, most of my clients have it, and let me use it.

You could spend a lot of time looking at all the Share presentations and other documentation on RMF and what it can do. But let me tell you the five features I use most.

CPC Capacity

I often logon to a new site, and need to find out information about the z/OS systems fast. RMF Monitor III regularly comes to the rescue. Let's take the CPC Capacity report:

                        RMF V2R4   CPC Capacity                   Line 1 of 18
Command ===>                                                  Scroll ===> CSR

Samples: 300     System: MVS1  Date: 08/27/22  Time: 00.34.00  Range: 300   Sec

Partition:   MVS1       3906 Model 605                         Boost: N
CPC Capacity:     591   Weight % of Max: ****   4h Avg:  352
Image Capacity:   591   WLM Capping %:    0.0   4h Max:  506   Group:   N/A
MT Mode IIP:      N/A   Prod % IIP:       N/A   AbsMSUCap: N   Limit:   N/A

Partition  --- MSU ---  Cap     Proc    Logical Util %   - Physical Util % -
             Def   Act  Def      Num    Effect   Total   LPAR  Effect  Total

*CP                             17.0                      0.8    67.3   68.1
MVS1           0   316  N N N    5.0      53.1    53.4    0.3    53.1   53.4
MVS2           0    12  N N N    5.0       2.1     2.1    0.0     2.1    2.1
MVS3           0    64  N N N    5.0      10.7    10.8    0.0    10.7   10.8
MVS4           0     9  N Y N    2.0       3.6     3.6    0.0     1.4    1.4
PHYSICAL                                                  0.4            0.4

*ICF                             3.0                      0.0     100    100
CF01                    N N N    2.0       100     100    0.0    66.7   66.7
CH02                    N N N    1.0       100     100    0.0    33.3   33.3
PHYSICAL                                                  0.0            0.0

*IIP                             7.0                      0.3    12.6   13.0
MVS1                    N N N    2.0      13.2    13.4    0.1     8.8    8.9
MVS2                    N N N    2.0       2.2     2.2    0.0     1.5    1.5
MVS3                    N N N    2.0       3.1     3.1    0.0     2.1    2.1
MVS4                    N N N    1.0       0.7     0.7    0.0     0.2    0.2
PHYSICAL                                                  0.2            0.2

It looks good, doesn't it? I find out so much information about the processor from this one screen. Let's list some of things I find useful:

This is a z14-605 (3906-605) mainframe. It has 5 GP CPUs. Its zIIP will be faster than the GPs (it isn't a 3906-7xx processor, so the GP processors have been 'kneecapped', or slowed down to reduce capacity).
This system is running z/OS version 2.4 (RMF V2R4 at the top).
MVS1 is the busiest z/OS system, and has a maximum CPU capacity of 591 MSUs (the capacity of the entire CEC), but was using only 316 MSUs in this period.
They're not doing a lot of 'hard capping.' MVS4 has only two logical processors defined (it is hard-capped), but all the others can access all 5 physical GP processors.
There are two coupling facilities on this CEC: this site is using 'inboard' coupling facilities. Each coupling facility has its own dedicated process (Effect Logical Utilisation is 100%).
All LPARs have zIIP assigned to them. They're not using Simultaneous Multithreading (MT Mode zIIP = N/A).
Not a lot of zIIP processing is performed: CPU usage for all of them is an average of 13%.
No capping of the LPAR group, or individual LPARs.

2. System Information

The System Information panel gives some of the same information:

                        RMF V2R4   System Information             Line 1 of 61
Command ===>                                                  Scroll ===> CSR

Samples: 300     System: MVS1  Date: 08/27/22  Time: 00.34.00  Range: 300   Sec

Partition:   MVS1     3906 Model 605          Appl%:      43  Policy: POLICYA
CPs Online:   5.0     Avg CPU Util%:   53     EAppl%:     47  Date:   04/26/19
AAPs Online:    -     Avg MVS Util%:   64     Appl% AAP:   -  Time:   09.42.12
IIPs Online:  2.0                             Appl% IIP:  12
Group     T WFL --Users--   RESP TRANS -AVG USG-  -Average Number Delayed For -
             %   TOT  ACT   Time  /SEC PROC  DEV  PROC  DEV STOR SUBS OPER  ENQ

*SYSTEM      71  661   13         0.90  3.5  7.6   2.3  1.3  0.0  0.9  0.0  0.1
*TSO         60   35    0         0.90  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0
*BATCH       69  221    7         0.00  2.3  3.5   1.9  0.5  0.0  0.2  0.0  0.0
*STC         72  367    6         0.00  0.9  4.1   0.3  0.8  0.0  0.7  0.0  0.1
*ASCH              0    0         0.00  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0
*OMVS        86   21    0         0.00  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0
*ENCLAVE     77   16  N/A          N/A  0.4  N/A   0.1  N/A  0.0  N/A  N/A  N/A
BATCH     W  67   15    6  48855  0.23  1.8  2.5   1.6  0.3  0.0  0.1  0.0  0.0
HOTBATCH  S  44    2    0  32983  0.07  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0
PRODBTCH  S  64    7    3  74999  0.10  0.8  1.6   1.1  0.2  0.0  0.1  0.0  0.0
ONLINE    W  77  227    4  282.5 87.48  0.6  2.6   0.3  0.6  0.0  0.1  0.0  0.0
CICS      S        0    0  0.773  0.41  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0
OTH_ONL   S        0    0  23.02  0.31  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0
ONLINE    S        0    0  41.21 85.55  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0
ONLTASKS  S  77  227    4  17407  1.22  0.6  2.6   0.3  0.6  0.0  0.1  0.0  0.0
OS390     W  47   75    1   7801  0.21  0.2  0.8   0.2  0.2  0.0  0.7  0.0  0.1
WASz      S  70   19    0   438K  0.00  0.1  0.0   0.0  0.0  0.0  0.0  0.0  0.0
RELAXED   S  46   51    1  0.000  0.00  0.1  0.8   0.1  0.2  0.0  0.7  0.0  0.1

But look a little deeper, and we can see that:

There are two zIIP processors working in the CEC: we couldn't find this out in the previous display: only the number of logical processors
The WLM policy used is called POLICYA.
The capture ratio is 88.7% (EAppl% / Avg CPU Util% = 47/53): not too bad. If the capture ratio is low, there may be potential to save CPU by improving it.
Average GP Usage is 53%, zIIP is 12%. So, there's some unused capacity here.
Overall, the system is performing well: most workflows (WFL) are 60+%. A low figure here may indicate that something is stuck. If that was the case, I could quickly see the main cause: if the workload was waiting for processor (not enough CPU) DASD (disk contention), or ENQ.

3. Sysplex Summary

The previous screen gave a quick view of the performance of the local z/OS system. Sometimes a Sysplex wide view of performance is handy. The Sysplex Summary helps out here:

                     RMF V2R4   Sysplex Summary - PLXTNKX        Line 1 of 432
Command ===>                                                  Scroll ===> CSR

WLM Samples: 480     Systems: 5  Date: 08/15/22 Time: 05.53.00 Range: 120   Sec

                      >>>>>>>>------------------<<<<<<<<

Service Definition: wlmdef01              Installed at: 04/24/22, 01.13.39
     Active Policy: POLICYA               Activated at: 04/24/22, 01.13.39

               ------- Goals versus Actuals --------  Trans --Avg. Resp. Time-
               Exec Vel  --- Response Time ---  Perf  Ended  WAIT EXECUT ACTUAL
Name     T  I  Goal Act  ---Goal--- --Actual--  Indx  Rate   Time   Time   Time

BATCH    W           50                               0.100 933.7  694.9   1454
PRDBATMD S  4    40  50                         0.80  0.067 742.4  546.8   1289
TBATDISC S          0.0                               0.033  1316  990.9   1783
         1  5    15 0.0                          N/A  0.033  1316  990.9   1783
CICS     W           47                               4.050 0.000   4208   4208
CICSDMED S  4        50  10000 90%         N/A   N/A  0.000 0.000  0.000  0.000
CICSMED  S  2        78   1000 80%         90%  0.50  1.850 0.000   1264   1264
CICSREGN S  2    35  75                         0.47  0.000
CICSUNC  S  3        48   1000 80%         94%  0.50  2.200 0.000   6684   6684
CICSVEL2 S  3    10  35                         0.29  0.000

A lot of similar information, but at the Sysplex level. In particular, I like to look at the WLM performance index: this is a real quick way of determining if there are potential performance problems happening. In this case, all performance indexes are less than 1: everything is humming along nicely.

4. Job Delay

In many ways, the Job Delay screen has been one of the best features of RMF Monitor III. When there is a performance issue with a job or started task, this will tell me why.

                        RMF V2R4   Job Delays
Command ===>                                                  Scroll ===> CSR

Samples:         System: MVS1  Date: 08/15/22  Time: 03.04.13  Range: 100   Sec

Job: PRDCICS      Primary delay: Job is waiting to use the processor.

Probable causes: 1) Higher priority work is using the system.
                 2) Improperly tuned dispatching priorities.


------------------------- Jobs Holding the Processor --------------------------
Job:             DBP1DDF     Job:          PRDIMS         Job:        PRDBAT01
Holding:         45%         Holding:         13%         Holding:          3%
PROC Using:      48%         PROC Using:      12%         PROC Using:       4%
DEV Using:        1%         DEV Using:        4%         DEV Using:        3%
--------------------------- Job Performance Summary ---------------------------
        Service       WFL -Using%- DLY IDL UKN ---- % Delayed for ---- Primary
CX ASID Class    P Cr  %   PRC DEV  %   %   %  PRC DEV STR SUB OPR ENQ Reason
SO 0182 CICSPRD  1     64   28   1  20   0  68  20   0   0   0   0   0 DBP1DDF

In this case, our CICS region is waiting for CPU. The biggest 'hog' is DBP1DDF (A Db2 DIST address space) and PRDIMS.

The Job Delay screen is the first screen I look at if there is a current performance problem.

5. Coupling Facility

Coupling facilities are important, but often forgotten. The RMF Monitor III CF Overview screen is great to get quick information about coupling facilities used:

                    RMF V2R4   CF Overview      - PRDPLX1         Line 1 of 2
Command ===>                                                  Scroll ===> CSR

Samples: 120     Systems: 5    Date: 08/15/22  Time: 05.53.00  Range: 120   Sec

CF Policy: POLICYA     Activated at: 08/10/22 00.47.46

--- Coupling Facility --- ----- Processor -------  Req  - Storage - --- SCM ---
Name     Type Mod Lvl Dyn Util% Def Shr Wgt   Eff  Rate Size  Avail Size  Avail

CF001    8561 T01  24 OFF  34.7   1   0       1.0  7622  110G   55G  144G  144G
CF002    8561 T01  24 OFF  35.3   1   0       1.0  8498  110G   52G  144G  144G

In this example, our Parallel Sysplex has two coupling facilities: each with one dedicated ICF processor (the Def column is 1: one defined. The Shr column is 0: none shared). Their CPU usage is a little high: around 35%. They are around half full, which is about a full as you want to get with coupling facilities.

Sometimes I want to quickly see how a site is using the coupling facility: do they have VTAM Generic resources, or not? Do they have an active Db2 data sharing group, or not? The RMF Monitor III CF Activity view is great for this: showing each structure, what it is, and how much it is used.

                    RMF V2R4   CF Activity      - PRDPLX1     Line 238 of 252
Command ===>                                                  Scroll ===> CSR

Samples: 120     Systems: 5    Date: 08/15/22  Time: 05.53.00  Range: 120   Sec

CF: ALL          Type  ST E System   CF    --- Sync ---  -------- Async -------
                                     Util   Rate   Avg    Rate   Avg   Chg  Del
Structure Name                        %            Serv          Serv   %    %

DB2G_GBP1        CACHE AP   *ALL      1.3    645     13  213.2    222  0.0  0.0
DB2G_GBP1        CACHE AS   *ALL      1.3    645     13  213.2    222  0.0  0.0
ISGLOCK          LOCK  A  - *ALL      0.0    9.2    130    1.9    561  0.0  0.0
RLS_CACHE01      CACHE A  N *ALL      0.0    1.0     74    0.7    161  0.0  0.0
RLS_LOCK01       LOCK  A  - *ALL      0.0    0.4    100   <0.1     33  0.0  0.0
RRSD_MAIN        LIST  A  N *ALL      0.0    0.1    262    0.3    226  0.0  0.0
RRSD_RESTART     LIST  A  N *ALL      0.0    1.1     33    1.5    219  0.0  0.0
RRSD_RMDATA      LIST  A  N *ALL      0.0    0.2    246    0.3    229  0.0  0.0
RRSQ_ARCHIVE     LIST  A  N *ALL      0.0    0.2     37    0.1    656  0.0  0.0
RRSQ_DELAYED     LIST  A  N *ALL      0.0    0.4     29    0.1    738  0.0  0.0
RRSQ_MAIN        LIST  A  N *ALL      0.0    0.3    140    0.1    495  0.0  0.0
RRSQ_RESTART     LIST  A  N *ALL      0.0    0.2    172    0.1    381  0.0  0.0
RRSQ_RMDATA      LIST  A  N *ALL      0.0    0.2    173    0.1    166  0.0  0.0
SYSARC_PRCRQ_RCL LIST  A  N *ALL      0.0    0.0      0    0.0      0  0.0  0.0
SYSIGGCAS_ECS    CACHE A  N *ALL      0.0    0.0      0    0.0      0  0.0  0.0
SYSPLEX_OPERLOG  LIST  A  N *ALL      0.0    7.6     53    3.1    656  0.0  0.0

What can we see in this example?

The site has enabled VSAM RLS (RLS_CACHE01 and RLS_LOCK01 structures), and is using it a bit, but not a lot (the Rate columns are low).
They have defined a catalog ECS structure (SYSIGGCAS_ECS), but aren't using it: catalog ECS is disabled.
They have a common DFHSM recall queue (SYSARC_PRCRQ_RCL) defined, but not used.
They have a Db2 data sharing group called DB2G (all Db2 structure names begin with the DSG name, and have group buffer pool structure ending in GBP*). The group buffer pool structures are duplexed (have two structures: one has a status of AP – the primary, one is AS – the secondary).
They have a GRS star (ISGLOCK structure)

Using RMF Monitor III

So, five (well, six if you include both coupling facility screens) RMF Monitor III screens that quickly give me a lot of information about a system: the CEC model, z/OS version, some features used, and if there are performance problems or not.

All the RMF Monitor III screens are for a short period of time: 5 minutes in all of the above examples. So, RMF Monitor III is only good for a quick 'feel' of what's happening. If I see something interesting (like a performance issue), then I'll look at a longer period (possibly using the RMF Postprocessor) to see if this is just a 'blip', or something real.

Yes, I love RMF Monitor III.

David Stephens