longpelaexpertise.com.au/ezine/MonitoringLoadLibraries.php?ezinemode=printfriend

LongEx Mainframe Quarterly - August 2019

technical: Monitoring and Tuning Load Modules and Libraries

Performance is important. So, we regularly monitor our mainframe performance: CICS response times, WLM delays, CPU usage, batch run times and more. But I bet you've never thought of monitoring your load modules (or program objects). But every time I enter a new site, I do some checking. Here's what I do, and why.

No Loads

Loading load modules and program objects (I'll just call them all load modules from now on) is expensive: it takes time and uses up CPU. So, we want to avoid doing it if possible.

If you have CA PMO/Quickfetch or CA PDSMAN, then you have facilities to automatically cache load modules (though the functionality for program objects is limited). The rest of us will be relying on LLA (library lookaside) and VLF (virtual lookaside facility). With these, you need to specify the PDS/PDSE names (I'll just call these load libraries from now on) in the relevant parmlib members (or SETPROG commands).

So, when I first arrive at a site, it's a nice check to see if there are some heavily hit libraries that may belong in LLA/VLF. For this, the SMF type 14/15 are brilliant. Here's some SAS/MXG job that gets information for any dataset with RECFM=U dataset (almost always a load library), and puts it into a csv file:

/* Put SMF 14/15 Records into an MXG PDB  */
INCLUDE TYPE(1415);

/* Search PDB for the info we need */
DATA SMF1415;
  SET WORK.TYPE1415(KEEP=DSNAME SYSTEM JOB DDNAME SMFTIME EXCPCNT
     PDSE RECFM BLKSIZE);
  WHERE RECFM='U';                  /* Only RECFM=U datasets */
  FORMAT DATE 5. TIME TIME5. ;
  DATE = DATEPART(SMFTIME) + 21916; /* Create an Excel date */
  TIME = TIMEPART(SMFTIME);
  DROP SMFTIME RECFM;

/* Put data into a CSV file */ 
ODS _ALL_ CLOSE;
ODS CSV FILE=CSV RS=NONE;
PROC PRINT    DATA=SMF1415    NOOBS LABEL;
RUN;
ODS CSV CLOSE ;

Here's an example of the output:

SYSBLOCK SIZEDD NAMEDATASETEXCPJOBPDSEDATE
MVSA32760STEPLIBAPP1.LOAD12913AP1J21Y43695
MVSA32760STEPLIBUNKNOWN - CONCAT BPAM 0AP1J2143695
MVSA32760STEPLIBSYS2.ESPLNK1AP1J0143695
MVSA32760STEPLIBAPP1.LOAD16983AP1J21Y43695
MVSA32760STEPLIBUNKNOWN - CONCAT BPAM0AP1J2143695
MVSA32760STEPLIBAPP1.LOAD16985AP1J21Y43695

The date looks a bit weird. But we've created an Excel date number. When we use Excel to format it, we get a nice date in any format we want.

We're really interested in STEPLIB and JOBLIB DD names (to exclude bind, IEBCOPY and similar jobs). I can then get the number of I/Os (EXCPs) for each load library: any dataset with EXCPs isn't getting benefit from VLF (and probably LLA).

Another option is to use sampling tools like Compuware Strobe, IBM APA or Macro FreezeFrame. I usually use these on the address spaces with high CPU usage: good for reducing CPU, good for finding out what is going on.

As part of this, I'll see what modules are being loaded during the sampling session. For Strobe, I'll look at the Data Set Characteristics Section:

  #DSC                                     ** DATA SET CHARACTERISTICS **
DDNAME  ACCESS POOL REC  BLK/CI BUF  RPL  -SPLITS- EXCP  DATA SET NAME
        METHOD  NO  SIZE  SIZE   NO STRNO CI   CA COUNTS

STEPLIB                                            12120 SYS1.SIEALNKE
STEPLIB                                            40330 SYSA.MQ.SCSQAUTH
STEPLIB                                                0 SYSA.MQ.SCSQANLE
STEPLIB                                                0 SYSA.MQ.SCSQLOAD
STEPLIB                                                0 SYST.DB2.UDSNEXT

You can see that there are a lot of EXCPs to SYS1.SIEALNKE, and SYSA.MQ.SCSQAUTH. This is expensive. Removing the STEPLIB and ensuring these libraries are in the linklist (they usually are) fixed the problem in this case.

For Macro4 FreezeFrame and IBM APA, the S03 report is interesting:

S03: Load Module Summary (05390/PRDJOB01)              Row 00001 of 00065
Command ===>                                              Scroll ===> CSR
 
Module    Locn Address Count Size(bytes) Attr   DDName   Load Library
 
CEEBINIT  JPA  0006E010   1      45,040  RU RN  -VLF-    SYS1.LE.SCEERUN
CEEEV003  PLPA 05B19000   1   5,576,384                  SYS1.LE.SCEEERUN
CEEPLPKA  PLPA 0606B000   1   2,135,944                  SYS1.LE.SCEERUN
CLB3D001  JPA  19866E90   1     381,296  RU RN  -VLF-    SYS1.CPP.SCLBDLL
AACPGM1   JPA  19200AA0 501       1,376         STEPLIB  SYSA.APP.LOADLIB
AACPGM2   JPA  12E93000   1      41,140         STEPLIB  SYSA.APP.LOADLIB
AACPGM3   JPA  19E88000   1       1,376         STEPLIB  SYSA.APP.LOADLIB

We can see some modules loaded from VLF: fast. A couple are in PLPA (even faster), and some are application modules loaded from STEPLIB. AACPGM1 looks to have been loaded a lot of times: worth looking at. This report is also great for confirming where a program is loaded from (VLF, LPA or a specific dataset).

VLF Performance

When defining VLF usage for load modules (LLA) or anything else, we specify the MAXVIRT value in the COFVLFxx parmlib member to specify the storage to use for VLF. But have we defined enough?

We could look at the SMF Type 41 (subtype 3) records, which will tell us this and more. But an easier way is to simply look at the IBM Health Checker VLF_MAXVIRT check. This is a nice early indicator of VLF storage issues. A quick look at other health checks beginning with CSV will find other issues like secondary extent definitions for linklist libraries.

Unused LLA

Ideally, we want libraries that are used in our Library Lookaside and VLF lists. The (undocumented) D LLA,STATISTICS z/OS command shows us some information about which libraries are benefitting from LLA or VLF.

D LLA,STATISTICS
CSV620I 05.16.00 LLA STATS DISPLAY 864
TOTAL DASD FETCHES: 6638  TOTAL VLF RETRIEVES: 15078
119 LIBRARY ENTRIES FOLLOW
LIBRARY: SYS1.MIGLIB
  MEMBERS:               2090
  MEMBERS FETCHED:        109  MEMBERS IN VLF:         55
  DASD FETCHES:            12  VLF RETRIEVES:        1598
LIBRARY: SYSA.APP.LINKLIB
  MEMBERS:                 45
  MEMBERS FETCHED:          0  MEMBERS IN VLF:          0
  DASD FETCHES:             0  VLF RETRIEVES:           0
LIBRARY: SYS1.LINKLIB
  MEMBERS:               4264
  MEMBERS FETCHED:        510  MEMBERS IN VLF:        330
  DASD FETCHES:            49  VLF RETRIEVES:         993

SYSA.APP.LINKLIB doesn't seem to be getting much benefit from LLA. Many z/OS monitors such as CA SYSVIEW can also show this detail.

CICS Residency

CICS programs can be cached by defining them as RESIDENT (loaded once at CICS startup time): the modules need never be loaded again (unless they're modified using CEMT SET PROGRAM NEWCOPY). The CICS end of day statistics give information about the number of times each CICS program has been loaded. Here's a job to run the DFHSTUP program to get program stats:

//STEP1     EXEC PGM=DFHSTUP
//STEPLIB   DD   DISP=SHR,DSN=CICS.SDFHLOAD
//          DD   DISP=SHR,DSN=CICS.SDFHAUTH
//SYSPRINT  DD   SYSOUT=*
//DFHSTATS  DD   DISP=SHR,DSN=SMF.DATASET
//DFHPRINT  DD   SYSOUT=*
//SYSPRINT  DD   SYSOUT=*
//SYSOUT    DD   SYSOUT=*
//DFHSTWRK  DD   UNIT=SYSDA,SPACE=(CYL,(50,50))
//SYSIN     DD   *
  SELECT APPLID=(CICSAPPL)
  SELECT TYPE=PROGRAM
  COLLECTION TYPE=EOD

This produces output similar to this:

PUBLIC PROGRAMS
_______________
Program   Times Fetch Avg Fetch Lbry Newcop Prog Curr     LIBRARY
Name       Used Count Time      Ofst  Count Size Loc  Name   DataSet Name
_________________________________________________________________________
CEEMENU0     0     0 00.00000    0      0     0 None
CEEMENU2     0     0 00.00000    0      0     0 None
CEEMENU3    57     0 00.00000    8      0 54488 ERDSA DFHRPL SYS1.SCEERUN
APP530I  26316   150 00.00465   26      0  7408 ERDSA DFHRPL SYSA.CICLOAD
APP540O 195205     0 00.00000   26      0 11280 ERDSA DFHRPL SYSA.CICLOAD

Ideally, you want your fetch count to be 0 (or 1 if you started CICS that day). In the above output, we can see that program APP530I is fetched a lot: may not be resident.

Other Checks

In our partner article Load Library Performance, we showed the PDSEs are faster than PDS libraries. If you need to use PDS libraries, a blocksize of 32760 is best. The SAS/MXG code for SMF type 14/15 records shown above also shows if load libraries are PDSEs (ie program objects), and their blocksize: a nice check.

I've seen a few sites that have run out of directory space for their load libraries (yet another reason to use PDSEs). So for large PDS datasets, I may use the ISPF DSLIST Information (I) line operator for some of the larger load libraries to check directory space information:

                             Data Set Information
Command ===>
 
Data Set Name . . . . : APPA.PROD.LOADLIB
 
General Data                          Current Allocation
Management class . . : NORMAL         Allocated cylinders : 2
Storage class  . . . : NORMAL         Allocated extents . : 1
  Volume serial . . . : VOL001        Maximum dir. blocks : 20
  Device type . . . . : 3390
Data class . . . . . : DEFAULT
  Organization  . . . : PO            Current Utilization
  Record format . . . : U              Used cylinders  . . : 2
  Record length . . . : 80             Used extents  . . . : 1
  Block size  . . . . : 32760          Used dir. blocks  . : 10
  1st extent cylinders: 2              Number of members . : 63
  Secondary cylinders : 1
  Data set name type  : PDS           Dates
                                       Creation date . . . : 2019/08/15
                                       Referenced date . . : 2019/08/18
                                       Expiration date . . : ***None***

This dataset is fine: 20 directory blocks allocated, only 10 used so far.

Conclusion

It's unlikely that load module and library overheads are making a huge difference to your performance (though you never know). However, there are a few issues with load module (and program objects) that are easy to monitor every few years, and easy to resolve. When I first enter a new site, these are some of the things that I'll check as I look for ways that the z/OS system can run faster, more efficiently, and with less errors.


David Stephens