technical: Let's Just Agree to Stop All Space Errors
At a client site recently, we had a production outage caused by a VSAM space issue. And it wasn't the first time.
When I first started as a systems programmer, space related abends were common. However, IBM and other vendors have been spending a lot of time creating ways of reducing these abends. In fact, I believe that space related errors should never, ever occur. But they still do.
So, how can we 100% eliminate B37, D37, E37 abends and other space-related failures?
Before we get started, let's review some of the more common abends and space-related errors. These can be summarized as:
- End of volume: the disk volume doesn't have an extent big enough for what is requested. The dataset may not be multi-volume, or all candidate volumes may be full.
- Out of Space: the dataset is full, and cannot obtain another extent. Either there is no secondary allocation, or it has reached the maximum number of extents possible.
- Out of resources. For example, PDS datasets are allocated with a specified number of blocks for the directory. If the number of members exceeds this space, we get a storage problem. Another example: the maximum number of volumes a dataset can use is 59.
- Too Big: Datasets have a limit on their size. Non-extended VSAM datasets can be up to 4GBytes. Non-VSAM and VSAM that are not SMS managed can allocate a maximum of 65,535 tracks.
We know the problems, so how can we avoid them?
1. Non-VSAM Extended Format
Extended Format datasets were introduced with DFSMS 1.1 in 1993. Changing the way that sequential datasets are stored, they offer benefits including:
- More Extents. Normal sequential datasets have a maximum of 16 extents on each volume. Extended datasets increase this to 123 extents.
- Normal sequential datasets can allocate up to 65,535 tracks on each volume. Extended datasets increase this to 16,777,215 tracks.
- Control unit padding is a situation where an error results in invalid data being returned from a sequential dataset read. Extended format dataset detects this error, and raises an I/O error rather than returning incorrect data.
- Can use system managed buffering (SMB)
- Can be compressed to reduce disk space
Bottom line: I believe that all sequential datasets should be extended by default. Non-extended should only be used for datasets that are not eligible (such as VIO datasets, datasets accessed with EXCP, system datasets and GTF trace datasets).
2. VSAM Extended Format
VSAM also has an extended format. It offers less immediate benefits than the non-VSAM extended format, but IBM says that it “increases the performance and reliability of an I/O.” It also opens the door to some optional features that may improve resilience:
- CA Reclaim: If a Control Area that was used has no more records, normally VSAM cannot reuse it. Enabling CA Reclaim allows these to be reused. This can stop a VSAM dataset from always increasing in size, requiring frequent reorganisations. Enabling CA Reclaim can resolve this.
- Striping: striping was originally designed to improve performance. However, striping can increase the number of extents from 255 to the number of stripes times 255.
- Compression: compressing a dataset reduces the space it uses on disk. This can help reduce the chance of space errors, particularly for very large datasets.
Bottom line: I believe that all VSAM datasets should be extended unless there is a specific reason (for example, page datasets and catalogs cannot be extended format). CA Reclaim should be enabled by default for all VSAM datasets. Any VSAM dataset that is very large (say, greater than 3000 cylinders) should be compressed.
PDSE dataset have been around for longer than extended format datasets. The latest PDSE (version 2) provide the following advantages:
- No directory Problems: no need to worry about running out of directory space.
- No compression: no need to compress datasets.
- Supports member generations.
- PDSE members can be shared, PDS members cannot.
- Only one member can be created at a time in normal PDS. This is not a limitation with PDSE.
- Normal PDS can have up to 16 extents. PDSEs can have up to 123.
- Faster directory access.
- Device independent: no need to reblock for different device types.
Bottom line: I believe that PDSE should be used for all partitioned datasets by default, unless there is a reason.
There's an excellent chance that a single volume will run out of space. If a dataset has only one candidate volume, this will cause problems.
You may not have heard of it, but the DFSMS Data Class parameter Dynamic Volume Count (DVC) can be used to automatically increase the number of candidate volumes if needed. Suppose a user specifies two candidate volumes, but the allocation needs four. Normally this allocation would fail. However, a DVC of 4 (or higher) would automatically and dynamically add another two candidate volumes: the allocation would succeed.
Bottom line: I believe that all VSAM datasets should be multi-volume. Critical non-VSAM datasets should also be multivolume. Every data class should have a DVC parameter > 5 unless there is a good reason.
5. Large Datasets
Very large datasets are hard to manage. At one site, they had a dataset with a primary space requirement of 6000 cylinders (3390 device). They regularly had problems as few eligible volumes had a free extent of 6000 cylinders. DFSMS offers some features to help here:
- Space Constraint Relief: Many don't realize that DFSMS can help here. The DFSMS Data Classes parameters Space Constraint Relief and Reduce Space Up To can be specified to automatically and dynamically reduce the size of an extent requested if there is not sufficient space to satisfy it.
- Compression: Compressing very large datasets is also an excellent idea, reducing the chance of a volume space error. Unfortunately, this incurs a CPU overhead unless you have a zEDC card.
- Extended Addressability: VSAM and non-VSAM datasets (including extended format) have a maximum size of 4GBytes. Enabling extended addressability increases this limit to the control interval size multiplied by 4GBytes, or the volume size multiplied by 59 (maximum volumes) – whichever is smaller. Note that any program accessing a dataset by the Relative Byte Address (RBA) may need to be modified to allow for the longer possible RBA address.
Bottom line: I believe that Space Constraint Relief should be enabled for all data classes unless there is a good reason. Any dataset that is greater than 3000 cylinders should be compressed. Any dataset with space greater than 3GBytes should have extended addressability.
6. Too Many Extents
We've already talked about how PDSE and other extended format datasets can increase the maximum number of extents possible. However, there are some other options to help.
- Secondary Extent Sizes: I often see space allocations like CYL(500,1). Such a small secondary extent value is just asking for an explosion in the number of extents. I'd have a rule that secondary extent sizes must be at least 10% of the primary extent size. This could be enforced in ACS routines.
- Additional Volume Amount: The DFSMS Data Class parameter ADDITIONAL VOLUME AMT can be used to specify if the primary or secondary allocation will be used when a dataset must use a new volume. Specifying this to PRIMARY ensures that the primary extent size will be used, reducing the number of possible extents.
- Extent Constraint Removal: You may not know it, but the DFSMS Data Class parameter EXTENT CONSTRAINT REMOVAL can be set to NO: this removes the 255-extent limit for VSAM datasets.Striping datasets will also increase the possible number of extents.
7. Enforce Rules
We've specified some recommended rules above. But how can we ensure that people will follow them?
For a start, most of the above features can only be used for SMS managed datasets. It still surprises me to see some sites with non-SMS managed volumes. As they are SMS managed, ACS routines can be coded to enforce many of these rules.
JCL check software such as ASG-Job/Scan, Broadcom CA JCLCheck o SEA JCLPlus+ can be configured with JCL rules, that could be used to enforce some of these rules. Compuware ThruPut Manager may also be able to perform some enforcement.
You may have noticed that I haven't talked about disk volume space. This is also important, but most sites monitor disk space carefully. I also haven't talked about other products like DTS SRS, BMC STOP-X37, and CA Allocate. These can certainly be used to reduce space-related abends, and can provide some sophisticated features to modify allocation parameters, and create alerts when problems are detected. However, basic z/OS features can do most of what these products offer.
I believe that there are sufficient tools and features to eliminate space related abends. So let's agree to implement these, and say goodbye to our space errors.