technical: Monitoring and Tuning Load Modules and Libraries
Performance is important. So, we regularly monitor our mainframe performance: CICS response times, WLM delays, CPU usage, batch run times and more. But I bet you've never thought of monitoring your load modules (or program objects). But every time I enter a new site, I do some checking. Here's what I do, and why.
No Loads
Loading load modules and program objects (I'll just call them all load modules from now on) is expensive: it takes time and uses up CPU. So, we want to avoid doing it if possible.
If you have CA PMO/Quickfetch or CA PDSMAN, then you have facilities to automatically cache load modules (though the functionality for program objects is limited). The rest of us will be relying on LLA (library lookaside) and VLF (virtual lookaside facility). With these, you need to specify the PDS/PDSE names (I'll just call these load libraries from now on) in the relevant parmlib members (or SETPROG commands).
So, when I first arrive at a site, it's a nice check to see if there are some heavily hit libraries that may belong in LLA/VLF. For this, the SMF type 14/15 are brilliant. Here's some SAS/MXG job that gets information for any dataset with RECFM=U dataset (almost always a load library), and puts it into a csv file:
/* Put SMF 14/15 Records into an MXG PDB */
INCLUDE TYPE(1415);
/* Search PDB for the info we need */
DATA SMF1415;
SET WORK.TYPE1415(KEEP=DSNAME SYSTEM JOB DDNAME SMFTIME EXCPCNT
PDSE RECFM BLKSIZE);
WHERE RECFM='U'; /* Only RECFM=U datasets */
FORMAT DATE 5. TIME TIME5. ;
DATE = DATEPART(SMFTIME) + 21916; /* Create an Excel date */
TIME = TIMEPART(SMFTIME);
DROP SMFTIME RECFM;
/* Put data into a CSV file */
ODS _ALL_ CLOSE;
ODS CSV FILE=CSV RS=NONE;
PROC PRINT DATA=SMF1415 NOOBS LABEL;
RUN;
ODS CSV CLOSE ;
Here's an example of the output:
SYS | BLOCK SIZE | DD NAME | DATASET | EXCP | JOB | PDSE | DATE |
MVSA | 32760 | STEPLIB | APP1.LOAD1 | 2913 | AP1J21 | Y | 43695 |
MVSA | 32760 | STEPLIB | UNKNOWN - CONCAT BPAM | 0 | AP1J21 | | 43695 |
MVSA | 32760 | STEPLIB | SYS2.ESPLNK | 1 | AP1J01 | | 43695 |
MVSA | 32760 | STEPLIB | APP1.LOAD1 | 6983 | AP1J21 | Y | 43695 |
MVSA | 32760 | STEPLIB | UNKNOWN - CONCAT BPAM | 0 | AP1J21 | | 43695 |
MVSA | 32760 | STEPLIB | APP1.LOAD1 | 6985 | AP1J21 | Y | 43695 |
The date looks a bit weird. But we've created an Excel date number. When we use Excel to format it, we get a nice date in any format we want.
We're really interested in STEPLIB and JOBLIB DD names (to exclude bind, IEBCOPY and similar jobs). I can then get the number of I/Os (EXCPs) for each load library: any dataset with EXCPs isn't getting benefit from VLF (and probably LLA).
Another option is to use sampling tools like Compuware Strobe, IBM APA or Macro FreezeFrame. I usually use these on the address spaces with high CPU usage: good for reducing CPU, good for finding out what is going on.
As part of this, I'll see what modules are being loaded during the sampling session. For Strobe, I'll look at the Data Set Characteristics Section:
#DSC ** DATA SET CHARACTERISTICS **
DDNAME ACCESS POOL REC BLK/CI BUF RPL -SPLITS- EXCP DATA SET NAME
METHOD NO SIZE SIZE NO STRNO CI CA COUNTS
STEPLIB 12120 SYS1.SIEALNKE
STEPLIB 40330 SYSA.MQ.SCSQAUTH
STEPLIB 0 SYSA.MQ.SCSQANLE
STEPLIB 0 SYSA.MQ.SCSQLOAD
STEPLIB 0 SYST.DB2.UDSNEXT
You can see that there are a lot of EXCPs to SYS1.SIEALNKE, and SYSA.MQ.SCSQAUTH. This is expensive. Removing the STEPLIB and ensuring these libraries are in the linklist (they usually are) fixed the problem in this case.
For Macro4 FreezeFrame and IBM APA, the S03 report is interesting:
S03: Load Module Summary (05390/PRDJOB01) Row 00001 of 00065
Command ===> Scroll ===> CSR
Module Locn Address Count Size(bytes) Attr DDName Load Library
CEEBINIT JPA 0006E010 1 45,040 RU RN -VLF- SYS1.LE.SCEERUN
CEEEV003 PLPA 05B19000 1 5,576,384 SYS1.LE.SCEEERUN
CEEPLPKA PLPA 0606B000 1 2,135,944 SYS1.LE.SCEERUN
CLB3D001 JPA 19866E90 1 381,296 RU RN -VLF- SYS1.CPP.SCLBDLL
AACPGM1 JPA 19200AA0 501 1,376 STEPLIB SYSA.APP.LOADLIB
AACPGM2 JPA 12E93000 1 41,140 STEPLIB SYSA.APP.LOADLIB
AACPGM3 JPA 19E88000 1 1,376 STEPLIB SYSA.APP.LOADLIB
We can see some modules loaded from VLF: fast. A couple are in PLPA (even faster), and some are application modules loaded from STEPLIB. AACPGM1 looks to have been loaded a lot of times: worth looking at. This report is also great for confirming where a program is loaded from (VLF, LPA or a specific dataset).
VLF Performance
When defining VLF usage for load modules (LLA) or anything else, we specify the MAXVIRT value in the COFVLFxx parmlib member to specify the storage to use for VLF. But have we defined enough?
We could look at the SMF Type 41 (subtype 3) records, which will tell us this and more. But an easier way is to simply look at the IBM Health Checker VLF_MAXVIRT check. This is a nice early indicator of VLF storage issues. A quick look at other health checks beginning with CSV will find other issues like secondary extent definitions for linklist libraries.
Unused LLA
Ideally, we want libraries that are used in our Library Lookaside and VLF lists. The (undocumented) D LLA,STATISTICS z/OS command shows us some information about which libraries are benefitting from LLA or VLF.
D LLA,STATISTICS
CSV620I 05.16.00 LLA STATS DISPLAY 864
TOTAL DASD FETCHES: 6638 TOTAL VLF RETRIEVES: 15078
119 LIBRARY ENTRIES FOLLOW
LIBRARY: SYS1.MIGLIB
MEMBERS: 2090
MEMBERS FETCHED: 109 MEMBERS IN VLF: 55
DASD FETCHES: 12 VLF RETRIEVES: 1598
LIBRARY: SYSA.APP.LINKLIB
MEMBERS: 45
MEMBERS FETCHED: 0 MEMBERS IN VLF: 0
DASD FETCHES: 0 VLF RETRIEVES: 0
LIBRARY: SYS1.LINKLIB
MEMBERS: 4264
MEMBERS FETCHED: 510 MEMBERS IN VLF: 330
DASD FETCHES: 49 VLF RETRIEVES: 993
SYSA.APP.LINKLIB doesn't seem to be getting much benefit from LLA. Many z/OS monitors such as CA SYSVIEW can also show this detail.
CICS Residency
CICS programs can be cached by defining them as RESIDENT (loaded once at CICS startup time): the modules need never be loaded again (unless they're modified using CEMT SET PROGRAM NEWCOPY). The CICS end of day statistics give information about the number of times each CICS program has been loaded. Here's a job to run the DFHSTUP program to get program stats:
//STEP1 EXEC PGM=DFHSTUP
//STEPLIB DD DISP=SHR,DSN=CICS.SDFHLOAD
// DD DISP=SHR,DSN=CICS.SDFHAUTH
//SYSPRINT DD SYSOUT=*
//DFHSTATS DD DISP=SHR,DSN=SMF.DATASET
//DFHPRINT DD SYSOUT=*
//SYSPRINT DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//DFHSTWRK DD UNIT=SYSDA,SPACE=(CYL,(50,50))
//SYSIN DD *
SELECT APPLID=(CICSAPPL)
SELECT TYPE=PROGRAM
COLLECTION TYPE=EOD
This produces output similar to this:
PUBLIC PROGRAMS
_______________
Program Times Fetch Avg Fetch Lbry Newcop Prog Curr LIBRARY
Name Used Count Time Ofst Count Size Loc Name DataSet Name
_________________________________________________________________________
CEEMENU0 0 0 00.00000 0 0 0 None
CEEMENU2 0 0 00.00000 0 0 0 None
CEEMENU3 57 0 00.00000 8 0 54488 ERDSA DFHRPL SYS1.SCEERUN
APP530I 26316 150 00.00465 26 0 7408 ERDSA DFHRPL SYSA.CICLOAD
APP540O 195205 0 00.00000 26 0 11280 ERDSA DFHRPL SYSA.CICLOAD
Ideally, you want your fetch count to be 0 (or 1 if you started CICS that day). In the above output, we can see that program APP530I is fetched a lot: may not be resident.
Other Checks
In our partner article Load Library Performance, we showed the PDSEs are faster than PDS libraries. If you need to use PDS libraries, a blocksize of 32760 is best. The SAS/MXG code for SMF type 14/15 records shown above also shows if load libraries are PDSEs (ie program objects), and their blocksize: a nice check.
I've seen a few sites that have run out of directory space for their load libraries (yet another reason to use PDSEs). So for large PDS datasets, I may use the ISPF DSLIST Information (I) line operator for some of the larger load libraries to check directory space information:
Data Set Information
Command ===>
Data Set Name . . . . : APPA.PROD.LOADLIB
General Data Current Allocation
Management class . . : NORMAL Allocated cylinders : 2
Storage class . . . : NORMAL Allocated extents . : 1
Volume serial . . . : VOL001 Maximum dir. blocks : 20
Device type . . . . : 3390
Data class . . . . . : DEFAULT
Organization . . . : PO Current Utilization
Record format . . . : U Used cylinders . . : 2
Record length . . . : 80 Used extents . . . : 1
Block size . . . . : 32760 Used dir. blocks . : 10
1st extent cylinders: 2 Number of members . : 63
Secondary cylinders : 1
Data set name type : PDS Dates
Creation date . . . : 2019/08/15
Referenced date . . : 2019/08/18
Expiration date . . : ***None***
This dataset is fine: 20 directory blocks allocated, only 10 used so far.
Conclusion
It's unlikely that load module and library overheads are making a huge difference to your performance (though you never know). However, there are a few issues with load module (and program objects) that are easy to monitor every few years, and easy to resolve. When I first enter a new site, these are some of the things that I'll check as I look for ways that the z/OS system can run faster, more efficiently, and with less errors.
David Stephens
|