technical: VSAM: Why Don't We Use Half-Track CI Sizes?
So, here's a question. If we use half-track blocking for sequential datasets, should we also use half-track Control Interval (CI) sizes for VSAM datasets? Those familiar with VSAM will be chuckling to themselves, and saying "no." And they're right. Almost all of the time.
Let's give a quick refresher about sequential dataset blocking. Every I/O (EXCP) costs. So, we want to minimize the number of I/Os. We can do this by maximizing the amount of data in each I/O.
For QSAM and BSAM, data is written in blocks. Ignoring chaining for a second, there is one I/O per block. The maximum blocksize is around 32kBytes. So, it makes sense that we want to use 32k as our blocksize.
However, most disks still pretend to be 3390 devices. Every 3390 track holds 56,664 bytes. Blocks cannot span tracks: one block must be on one, and only one track (though we can have multiple blocks on one track). If we use a 32k blocksize, we'd use 32k of our 56,664 bytes, and the rest would be wasted (another block won't fit in the remaining space).
An interesting exception to this is RECFM=U datasets. Find out more in our article about RECFM=U.
So, the best compromise is to have a block that is around one-half of a 3390 track: around 28000 bytes. Two blocks fit nicely on a track with little space wasted, and we get pretty good performance.
The good news is that we can ask z/OS to give us the perfect blocksize by specifying BLKSIZE=0 (well, most of the time).
Sequential datasets are, well, sequential. If you want a specific record, you start from the beginning, and check every record until you get the one you want. If you want to add a record, it's inserted at the end of the dataset. Sequential.
One of the very cool things about VSAM is that it isn't sequential unless you want it to be. Want to get a record? Specify a key (KSDS), relative record number (RRDS) or relative byte address (ESDS), and you've got it. No need to scan every record from the beginning to find it.
VSAM I/O is based on the control interval (CI): each I/O transfers one CI. So, if we want to read a single 250-byte record, we're going to read the entire CI. But if we're only getting (or writing) one record at a time from different areas, we're probably not going to need any of the other records in the CI. So, a large 24kByte CI doesn't buy us anything. In fact, it probably costs, as transferring a 24k CI will likely take a little longer than a 1k CI.
Suppose we have more than one task updating a VSAM dataset in an address space. Unless using VSAM RLS, VSAM will perform locking at the CI level. So, if one task updates one record in a CI, no other task can update any other record in the CI until the first task finishes the unit of work. In this case, a large CI size may be a problem for random access.
Sequential Access for VSAM
But not all VSAM I/O will be random. We may need to process it sequentially. SMF SYS1.MANx datasets are a great example. So, shouldn't we have a larger CI size for these? Yes.
Let's put this into action. We've created a sequential dataset with around 90,000 records that we will be loading into a VSAM KSDS (fixed 512-byte record length) using IDCAMS REPRO. Here's the performance with a small and big data CI size:
|Data CI Size (bytes)||CPU Seconds||Elapsed Time (secs)||EXCPs|
This is interesting. A nice performance gain between a CI size of 1k and 4k, and then few benefits after that. But you could argue that half-track CI sizes are the most efficient.
We've ignored the index CI size: normally we let the system decide this for us.
Do We Need Half-Track Blocking?
One of the reasons we use half-track blocking in sequential datasets is to minimize the disk wasted, while maximising performance. However, with VSAM it's a little different.
With VSAM, an I/O is done by CI. If you read a record, the entire CI is read from disk. Similar to a block for sequential datasets. However, with sequential datasets, one block is the same as one physical block on disk. So, if you specify a 24000 byte blocksize, 24000 byte blocks will be written to disk.
With VSAM, a CI can be one or more physical blocks. For example, if you specify a CI size of 32000 bytes, you'll actually have three blocks written to disk for every CI. We can see this in the IDCAMS LISTCAT output:
Look at the PHYRECS/TRK field. This shows that our VSAM dataset with a CI size of 32000 uses three physical records (or blocks) per track. Smart, huh? VSAM minimizes disk wastage.
IBM publish a table showing the physical blocksize for some of the more common VSAM CI sizes. Here are a couple for 3390 disks (which is what most sites use):
|CI Size (Bytes)||Block Size (bytes)||Physical Blocks / Track|
Here's an interesting observation: the largest physical blocksize used is for CI sizes around one-half of a track in size. A blocksize with a data CI size of 24,576 will have a larger physical blocksize (24,576 bytes) than one with 32k (16,384 bytes).
So, doesn't this mean that we should be using half-track CI sizes for sequentially accessed datasets? Maybe, though our test above showed little difference between a 24000 and 32000 data CI size.
Those SMF SYS1.MANx datasets are an interesting example. SMF datasets must have a physical blocksize the same as the CI size. So a CI size of 26,624 bytes is the maximum, and probably provides some performance benefits.
The flip side is that SMF uses only one CI per record. So large CI sizes may waste some disk space.
The rule of thumb is to use a data CI size of 4kBytes, and let z/OS determine the index CI size. And in most situations, this will work well. However, with SMF datasets, using half-track CI sizes may be a good idea.