Longpela Expertise

Home | Press Room | Contact Us | Site Map

About Us

Consulting

Training

Our Consultants

FAQ

In the Community

Our e-zine

Our Books

Our Articles

Free Tools and Code

Mainframe Links

Lookup Mainframe Software

LongEx Mainframe Quarterly - August 2019

technical: Load Library Performance

Load libraries and program objects are always defined with an 'undefined' record format (RECFM=U). In our partner article RECFM=U: What Exactly Is It, we talk about how the LRECL definition for RECFM=U datasets is ignored.

But are there any other allocation issues for load and program libraries? What effect does blocksize have on their performance?. And does the size of the library matter? The answer is "maybe." Let's see what that means.

Blocksize

Best practices for non-VSAM disk datasets is to use half-track blocksizes: the blocksize is the same as half a track. Everyone uses DASD that pretends to be 3390s, so this is around 27,000 bytes and change. Everyone agrees that this gives good performance, while minimizing wasted DASD space. But in 2014, IBMs John Eells wrote a post on the ibm-main list that said that 32,760 is the best blocksize for load modules. Who is right?

Let's do some benchmarking.

Blocksize Performance

We used IEFBR14 as our first load module: very small (only 8 bytes). We wrote an assembler program to do a LOAD, followed by a DELETE of this program 5000 times. We used three load libraries (PDS) with blocksizes of 6144, 23440 (half-track blocking) and 32760. We specified these libraries in a STEPLIB or our JCL, so no LLA, library lookaside or VLF (no CA PDSMAN, Quickfetch or PMO either). Here's what we found:

Blocksize	CPU Seconds	EXCPs	Elapsed Time (sec)
6144	0.13	10006	3
23440	0.13	10006	3
32760	0.13	10006	3

No change: same CPU usage (0.13 seconds), number of EXCPs (or I/Os: 10006), and same elapsed time (3 seconds). This is no surprise: IEFBR14 fits comfortably into one block in all three PDS datasets. We repeated the test, but using the ARCCTL module (size 4MBytes), and only 1000 iterations:

Blocksize	CPU Seconds	EXCPs	Elapsed Time (sec)
6144	4.65	1646000	243
23440	2.69	504000	145
32760	2.61	426000	133

Now we see some differences. Using 32760 almost halved the elapsed time, and also greatly reduced the CPU time. So it's true: 32760 is the best blocksize for load modules.

Blocksize Space

But doesn't a 32760 blocksize increase the DASD space we need? We took sys1.linklib from a z/OS 2.3 system, and copied it into three PDS datasets with 6144, 23440, and 32760 blocksizes. The result:

Blocksize	Tracks Used
6144	2895
23440	2640
32760	2620

Interestingly, 32760 used the least space of all the blocksize options. This is because the z/OS binder is smart. It will use 'small blocks' to fill up the space in a track: not all records will have a blocksize of 32760.

Blocksize and Program Objects

So how do PDSEs and program objects compare? We first ran a test with our large ARCCTL module, comparing load times for a PDSE (blocksize=32760) with our three load libraries. Here's what we found:

Blocksize	CPU Seconds	EXCPs	Elapsed Time (sec)
6144	4.65	1646000	243
23440	2.69	504000	145
32760	2.61	426000	133
(PDSE)	0.32	100000	4

The PDSE is much faster. It also uses far less EXCPs and CPU seconds. But there's a catch. By default, a program does not load the entire program object from a PDSE. Program objects have different classes of text (the actual program). Only some classes are brought into memory when the module is loaded. Other classes are loaded when needed. So, for our PDSE, we're not loading the entire ARCCTL.

We can disable this by using the FETCHOPT=(PACK,PRIME) parameter of the binder. Doing this for ARCCTL, our test results become:

Blocksize	CPU Seconds	EXCPs	Elapsed Time (sec)
6144	4.65	1646000	243
23440	2.69	504000	145
32760	2.61	426000	133
(PDSE)	3.26	1285000	31

The PDSE is still a lot faster, but uses a little more CPU than our optimal 32760.

So how do blocksizes affect program objects in PDSEs? We ran a similar test to our first two tests above, with FETCHOPT=(PACK,PRIME). We chose the modules CSQLINK (length 424 bytes) and CSQUDMSG (length 264kbytes), and performed 5000 LOAD/DELETEs in our program. The results:

Module	Blocksize	CPU Seconds	EXCPs	Elapsed Time (sec)
CSQLINK	6144	0.4	25006	2
CSQLINK	23440	0.4	25006	2
CSQLINK	32760	0.4	25006	1

CSQUDMSG	6144	1.3	230000	7
CSQUDMSG	23440	1.3	230000	6
CSQUDMSG	32760	1.3	230000	7

No difference. This is because program objects internally use 4kByte blocks. Changing the dataset blocksize has no effect.

Directory Search

At the beginning of this article, we talked about how the size of a load library may also affect performance. To load a module, the directory of the PDS/PDSE must be searched to find the module location, and then the module is loaded. So, does the number of members in a PDS or PDSE make any difference?

Let's first looks at load modules in PDS datasets. We compared our two modules (IEFBR14 and ARCCTL) in two PDS datasets, each with a blocksize of 32760. One PDS had 2 members, one had 4200 members. Here are the results:

Module	Blocksize	Members	CPU Seconds	EXCPs	Elapsed Time (sec)
IEFBR14	32760	2	0.13	10006	3
IEFBR14	32760	4200	0.14	10006	9

ARCCTL	32760	2	2.61	426000	133
ARCCTL	32760	4200	3.07	502000	209

A small CPU increase, and larger elapsed time increase when there were more members. This is because with PDS datasets, the directory is searched sequentially. PDSE datasets advertise a faster search time because of indexed directories. The results to load IEFBR14 5000 times from two PDSEs (one with 2 members, one with 4200 members) are:

Member	CPU Sec	EXCP	Elapsed
2	0.32	25006	1
4200	0.32	25012	2

Almost identical.

Conclusion

The results are pretty clear. For load libraries, use a blocksize of 32760, and don't have too many load modules in one PDS. However, programs objects in PDSEs will be faster. No need to worry about blocksize (keep it above 4k) or the number of program objects in the PDSE.

In many cases these changes won't be significant. Everyone uses LLA and library lookaside (or products like CA PDSMAN, PMO and Quickfetch) to avoid the overheads of loads and directory searches. Program prefetch in IMS, and residency in CICS does something similar. However, it's an easy thing to have a standard blocksize (32,760) for load libraries. Most software vendors already do this, as do many sites.

A quick search at one site showed that almost 80% of all load PDS libraries had a blocksize of 32000 or higher. 7% of load libraries had a blocksize of 8k or lower.

David Stephens

LongEx Quarterly is a quarterly eZine produced by Longpela Expertise. It provides Mainframe articles for management and technical experts. It is published every November, February, May and August.

The opinions in this article are solely those of the author, and do not necessarily represent the opinions of any other person or organisation. All trademarks, trade names, service marks and logos referenced in these articles belong to their respective companies.

Although Longpela Expertise may be paid by organisations reprinting our articles, all articles are independent. Longpela Expertise has not been paid money by any vendor or company to write any articles appearing in our e-zine.

Inside This Month

Printer Friendly Version

Read Previous Articles

	Longpela Expertise understand what's 'under the hood' of z/OS and related systems like CICS and IMS.
We can read dumps, diagnose complex problems, and write and maintain assembler routines. Contact us to get your own z/OS internals expert.


© Copyright 2019 Longpela Expertise \| ABN 55 072 652 147			Legal Disclaimer \| Privacy Policy

Website Design: Hecate Jay