Longpela Expertise

Home | Press Room | Contact Us | Site Map

About Us

Consulting

Training

Our Consultants

FAQ

In the Community

Our e-zine

Our Books

Our Articles

Free Tools and Code

Mainframe Links

Lookup Mainframe Software

LongEx Mainframe Quarterly - November 2019

technical: VSAM: Why Don't We Use Half-Track CI Sizes?

So, here's a question. If we use half-track blocking for sequential datasets, should we also use half-track Control Interval (CI) sizes for VSAM datasets? Those familiar with VSAM will be chuckling to themselves, and saying "no." And they're right. Almost all of the time.

Sequential Blocking

Let's give a quick refresher about sequential dataset blocking. Every I/O (EXCP) costs. So, we want to minimize the number of I/Os. We can do this by maximizing the amount of data in each I/O.

For QSAM and BSAM, data is written in blocks. Ignoring chaining for a second, there is one I/O per block. The maximum blocksize is around 32kBytes. So, it makes sense that we want to use 32k as our blocksize.

However, most disks still pretend to be 3390 devices. Every 3390 track holds 56,664 bytes. Blocks cannot span tracks: one block must be on one, and only one track (though we can have multiple blocks on one track). If we use a 32k blocksize, we'd use 32k of our 56,664 bytes, and the rest would be wasted (another block won't fit in the remaining space).

An interesting exception to this is RECFM=U datasets. Find out more in our article about RECFM=U.

So, the best compromise is to have a block that is around one-half of a 3390 track: around 28000 bytes. Two blocks fit nicely on a track with little space wasted, and we get pretty good performance.

The good news is that we can ask z/OS to give us the perfect blocksize by specifying BLKSIZE=0 (well, most of the time).

Random Access

Sequential datasets are, well, sequential. If you want a specific record, you start from the beginning, and check every record until you get the one you want. If you want to add a record, it's inserted at the end of the dataset. Sequential.

One of the very cool things about VSAM is that it isn't sequential unless you want it to be. Want to get a record? Specify a key (KSDS), relative record number (RRDS) or relative byte address (ESDS), and you've got it. No need to scan every record from the beginning to find it.

VSAM I/O is based on the control interval (CI): each I/O transfers one CI. So, if we want to read a single 250-byte record, we're going to read the entire CI. But if we're only getting (or writing) one record at a time from different areas, we're probably not going to need any of the other records in the CI. So, a large 24kByte CI doesn't buy us anything. In fact, it probably costs, as transferring a 24k CI will likely take a little longer than a 1k CI.

Suppose we have more than one task updating a VSAM dataset in an address space. Unless using VSAM RLS, VSAM will perform locking at the CI level. So, if one task updates one record in a CI, no other task can update any other record in the CI until the first task finishes the unit of work. In this case, a large CI size may be a problem for random access.

Sequential Access for VSAM

But not all VSAM I/O will be random. We may need to process it sequentially. SMF SYS1.MANx datasets are a great example. So, shouldn't we have a larger CI size for these? Yes.

Let's put this into action. We've created a sequential dataset with around 90,000 records that we will be loading into a VSAM KSDS (fixed 512-byte record length) using IDCAMS REPRO. Here's the performance with a small and big data CI size:

Data CI Size (bytes)	CPU Seconds	Elapsed Time (secs)	EXCPs
1024	0.07	7	1628
4096	0.05	4	1205
24000	0.04	4	1194
32000	0.04	4	1195

This is interesting. A nice performance gain between a CI size of 1k and 4k, and then few benefits after that. But you could argue that half-track CI sizes are the most efficient.

We've ignored the index CI size: normally we let the system decide this for us.

Do We Need Half-Track Blocking?

One of the reasons we use half-track blocking in sequential datasets is to minimize the disk wasted, while maximising performance. However, with VSAM it's a little different.

With VSAM, an I/O is done by CI. If you read a record, the entire CI is read from disk. Similar to a block for sequential datasets. However, with sequential datasets, one block is the same as one physical block on disk. So, if you specify a 24000 byte blocksize, 24000 byte blocks will be written to disk.

With VSAM, a CI can be one or more physical blocks. For example, if you specify a CI size of 32000 bytes, you'll actually have three blocks written to disk for every CI. We can see this in the IDCAMS LISTCAT output:

VOLUME                                                  
  VOLSER------------VPMVSH     PHYREC-SIZE--------16384
  DEVTYPE------X'3010200F'     PHYRECS/TRK------------3
  VOLFLAG------------PRIME     TRACKS/CA-------------15
  EXTENTS:
  LOW-CCHH-----X'02BB0000'     LOW-RBA----------------0
  HIGH-CCHH----X'02CE000E'     HIGH-RBA--------14417919
  LOW-CCHH-----X'02CF0000'     LOW-RBA---------14417920

Look at the PHYRECS/TRK field. This shows that our VSAM dataset with a CI size of 32000 uses three physical records (or blocks) per track. Smart, huh? VSAM minimizes disk wastage.

IBM publish a table showing the physical blocksize for some of the more common VSAM CI sizes. Here are a couple for 3390 disks (which is what most sites use):

CI Size (Bytes)	Block Size (bytes)		Physical Blocks / Track
	Data	Index	Data	Index
1024	1024	1024	33	33
4096	4096	4096	12	12
8192	8192	8192	6	6
16384	16384	16384	3	3
24576	24576	24576	2	2
26624	26624	26624	2	2
28672	7168	28672	7	1
32768	16384	32768	3	3

Here's an interesting observation: the largest physical blocksize used is for CI sizes around one-half of a track in size. A blocksize with a data CI size of 24,576 will have a larger physical blocksize (24,576 bytes) than one with 32k (16,384 bytes).

So, doesn't this mean that we should be using half-track CI sizes for sequentially accessed datasets? Maybe, though our test above showed little difference between a 24000 and 32000 data CI size.

Those SMF SYS1.MANx datasets are an interesting example. SMF datasets must have a physical blocksize the same as the CI size. So a CI size of 26,624 bytes is the maximum, and probably provides some performance benefits.

The flip side is that SMF uses only one CI per record. So large CI sizes may waste some disk space.

Conclusion

The rule of thumb is to use a data CI size of 4kBytes, and let z/OS determine the index CI size. And in most situations, this will work well. However, with SMF datasets, using half-track CI sizes may be a good idea.

David Stephens

LongEx Quarterly is a quarterly eZine produced by Longpela Expertise. It provides Mainframe articles for management and technical experts. It is published every November, February, May and August.

The opinions in this article are solely those of the author, and do not necessarily represent the opinions of any other person or organisation. All trademarks, trade names, service marks and logos referenced in these articles belong to their respective companies.

Although Longpela Expertise may be paid by organisations reprinting our articles, all articles are independent. Longpela Expertise has not been paid money by any vendor or company to write any articles appearing in our e-zine.

Inside This Month

Printer Friendly Version

Read Previous Articles

	Longpela Expertise understand what's 'under the hood' of z/OS and related systems like CICS and IMS.
We can read dumps, diagnose complex problems, and write and maintain assembler routines. Contact us to get your own z/OS internals expert.


© Copyright 2019 Longpela Expertise \| ABN 55 072 652 147			Legal Disclaimer \| Privacy Policy

Website Design: Hecate Jay