technical: Load Library Performance
Load libraries and program objects are always defined with an 'undefined' record format (RECFM=U). In our partner article RECFM=U: What Exactly Is It, we talk about how the LRECL definition for RECFM=U datasets is ignored.
But are there any other allocation issues for load and program libraries? What effect does blocksize have on their performance?. And does the size of the library matter? The answer is "maybe." Let's see what that means.
Blocksize
Best practices for non-VSAM disk datasets is to use half-track blocksizes: the blocksize is the same as half a track. Everyone uses DASD that pretends to be 3390s, so this is around 27,000 bytes and change. Everyone agrees that this gives good performance, while minimizing wasted DASD space. But in 2014, IBMs John Eells wrote a post on the ibm-main list that said that 32,760 is the best blocksize for load modules. Who is right?
Let's do some benchmarking.
Blocksize Performance
We used IEFBR14 as our first load module: very small (only 8 bytes). We wrote an assembler program to do a LOAD, followed by a DELETE of this program 5000 times. We used three load libraries (PDS) with blocksizes of 6144, 23440 (half-track blocking) and 32760. We specified these libraries in a STEPLIB or our JCL, so no LLA, library lookaside or VLF (no CA PDSMAN, Quickfetch or PMO either). Here's what we found:
Blocksize | CPU Seconds | EXCPs | Elapsed Time (sec) |
6144 | 0.13 | 10006 | 3 |
23440 | 0.13 | 10006 | 3 |
32760 | 0.13 | 100063 | |
No change: same CPU usage (0.13 seconds), number of EXCPs (or I/Os: 10006), and same elapsed time (3 seconds). This is no surprise: IEFBR14 fits comfortably into one block in all three PDS datasets. We repeated the test, but using the ARCCTL module (size 4MBytes), and only 1000 iterations:
Blocksize | CPU Seconds | EXCPs | Elapsed Time (sec) |
6144 | 4.65 | 1646000 | 243 |
23440 | 2.69 | 504000 | 145 |
32760 | 2.61 | 426000 | 133 |
Now we see some differences. Using 32760 almost halved the elapsed time, and also greatly reduced the CPU time. So it's true: 32760 is the best blocksize for load modules.
Blocksize Space
But doesn't a 32760 blocksize increase the DASD space we need? We took sys1.linklib from a z/OS 2.3 system, and copied it into three PDS datasets with 6144, 23440, and 32760 blocksizes. The result:
Blocksize | Tracks Used |
6144 | 2895 |
23440 | 2640 |
32760 | 2620 |
Interestingly, 32760 used the least space of all the blocksize options. This is because the z/OS binder is smart. It will use 'small blocks' to fill up the space in a track: not all records will have a blocksize of 32760.
Blocksize and Program Objects
So how do PDSEs and program objects compare? We first ran a test with our large ARCCTL module, comparing load times for a PDSE (blocksize=32760) with our three load libraries. Here's what we found:
Blocksize | CPU Seconds | EXCPs | Elapsed Time (sec) |
6144 | 4.65 | 1646000 | 243 |
23440 | 2.69 | 504000 | 145 |
32760 | 2.61 | 426000 | 133 |
(PDSE) | 0.32 | 100000 | 4 |
The PDSE is much faster. It also uses far less EXCPs and CPU seconds. But there's a catch. By default, a program does not load the entire program object from a PDSE. Program objects have different classes of text (the actual program). Only some classes are brought into memory when the module is loaded. Other classes are loaded when needed. So, for our PDSE, we're not loading the entire ARCCTL.
We can disable this by using the FETCHOPT=(PACK,PRIME) parameter of the binder. Doing this for ARCCTL, our test results become:
Blocksize | CPU Seconds | EXCPs | Elapsed Time (sec) |
6144 | 4.65 | 1646000 | 243 |
23440 | 2.69 | 504000 | 145 |
32760 | 2.61 | 426000 | 133 |
(PDSE) | 3.26 | 1285000 | 31 |
The PDSE is still a lot faster, but uses a little more CPU than our optimal 32760.
So how do blocksizes affect program objects in PDSEs? We ran a similar test to our first two tests above, with FETCHOPT=(PACK,PRIME). We chose the modules CSQLINK (length 424 bytes) and CSQUDMSG (length 264kbytes), and performed 5000 LOAD/DELETEs in our program. The results:
Module | Blocksize | CPU Seconds | EXCPs | Elapsed Time (sec) |
CSQLINK | 6144 | 0.4 | 25006 | 2 |
CSQLINK | 23440 | 0.4 | 25006 | 2 |
CSQLINK | 32760 | 0.4 | 25006 | 1 |
|
CSQUDMSG | 6144 | 1.3 | 230000 | 7 |
CSQUDMSG | 23440 | 1.3 | 230000 | 6 |
CSQUDMSG | 32760 | 1.3 | 230000 | 7 |
No difference. This is because program objects internally use 4kByte blocks. Changing the dataset blocksize has no effect.
Directory Search
At the beginning of this article, we talked about how the size of a load library may also affect performance. To load a module, the directory of the PDS/PDSE must be searched to find the module location, and then the module is loaded. So, does the number of members in a PDS or PDSE make any difference?
Let's first looks at load modules in PDS datasets. We compared our two modules (IEFBR14 and ARCCTL) in two PDS datasets, each with a blocksize of 32760. One PDS had 2 members, one had 4200 members. Here are the results:
Module | Blocksize | Members | CPU Seconds | EXCPs | Elapsed Time (sec) |
IEFBR14 | 32760 | 2 | 0.13 | 10006 | 3 |
IEFBR14 | 32760 | 4200 | 0.14 | 10006 | 9 |
|
ARCCTL | 32760 | 2 | 2.61 | 426000 | 133 |
ARCCTL | 32760 | 4200 | 3.07 | 502000 | 209 |
A small CPU increase, and larger elapsed time increase when there were more members. This is because with PDS datasets, the directory is searched sequentially.
PDSE datasets advertise a faster search time because of indexed directories. The results to load IEFBR14 5000 times from two PDSEs (one with 2 members, one with 4200 members) are:
Member | CPU Sec | EXCP | Elapsed |
2 | 0.32 | 25006 | 1 |
4200 | 0.32 | 25012 | 2 |
Almost identical.
Conclusion
The results are pretty clear. For load libraries, use a blocksize of 32760, and don't have too many load modules in one PDS. However, programs objects in PDSEs will be faster. No need to worry about blocksize (keep it above 4k) or the number of program objects in the PDSE.
In many cases these changes won't be significant. Everyone uses LLA and library lookaside (or products like CA PDSMAN, PMO and Quickfetch) to avoid the overheads of loads and directory searches. Program prefetch in IMS, and residency in CICS does something similar. However, it's an easy thing to have a standard blocksize (32,760) for load libraries. Most software vendors already do this, as do many sites.
A quick search at one site showed that almost 80% of all load PDS libraries had a blocksize of 32000 or higher. 7% of load libraries had a blocksize of 8k or lower.
David Stephens
|