technical: Accessing Open VSAM: Is It Ever a Good Idea?
I regularly see sites that access VSAM files already open for update by another process: usually CICS.
A batch job may read an open CICS/VSAM file for reporting, or support staff may use an online utility like IBM FileManager to view an open CICS/VSAM file. Or even a batch job that updates a VSAM file.
Most believe that this is a bad idea. But, is there any time when it is OK?
SHAREOPTIONS - A Two-Edged Sword
Updating a file from two processes at the same time is generally a bad idea on any platform. But VSAM gives you SHAREOPTIONS so you can decide yourself. Here's what's on the menu:
- SHAREOPTIONS(1,*) - only one process can open the VSAM file. Integrity ensured.
- SHAREOPTIONS(2,*) - only one process can open the VSAM file for update, but many can open it for read at the same time. This is the most common option I see.
- SHAREOPTIONS(3,*) and SHAREOPTIONS(4,*) - VSAM is saying "OK, if you really think you know what you're doing. But don't say I didn't warn you."
SHAREOPTIONS(2.*) provides reasonable data integrity. But there's a chance that processes reading this dataset may not get the latest information. We talk more about this in our partner article.
SHAREOPTION(3.*) scare me. VSAM allows multiple processes to update the same dataset, no questions asked. In their manuals IBM say:
"User programs that ignore the write integrity guidelines can cause VSAM program checks, lost or inaccessible records, uncorrectable data set failures, and other unpredictable results. This option places responsibility on each user sharing the data set."
SHAREOPTIONS(4.*) is the same as SHAREOPTIONS(3.*), but gets around most of the 'data missing' issues: issues when a process gets records from buffers that are out of date. But this comes at a high price: it eliminates buffer performance benefits. Expect all processes accessing these datasets to perform terribly.
It sounds like accessing VSAM datasets from two processes (such as batch jobs, started tasks, CICS regions etc) is a bad idea. An in many cases, it is. But sometimes, it may be OK. Let's look at some examples.
I often see sites performing a backup of an open VSAM dataset. My first reaction is "Stop!" But why?
Reading a VSAM dataset open to another process like CICS may 'miss' some data or updates. ESDS inserts or updates made after the backup opens the dataset (remember: backups aren't instantaneous) may be missed. Any backup of an open VSAM dataset is at best, 'fuzzy'.
There are some rare times when a fuzzy backup is enough. For example, you may have a VSAM file that is a log. So, a couple of missing records may not be a big deal. Or there may be a VSAM file that is updated (no new records) once a day at 8pm. So, a backup at midnight may be fine.
However, if you have financial data that must be absolutely up to date, fuzzy doesn't cut it. What's more, you may have transactions that update multiple VSAM datasets in the same unit of work.
I promise I won't hop on my soap-box a second time in this article about SHAREOPTIONS(3.*) and SHAREOPTIONS(4.*): the issues are already covered. However, there may be times when these may be appropriate.
For example, you may have a process that actually serializes the access (though I haven't seen an application that does this). You could use CICS global ENQs to externalize ENQs, and then perform a z/OS enqueue from a batch job.
Some program products require VSAM datasets with SHAREOPTIONS(3.*) or SHAREOPTIONS(4.*). For example, BMC RUV REGISET files, MicroFocus Changeman SSM master files, and CA TPX Admin files. These will have some method of serialization.
Some sites may rely on timing. CICS only updates between 0900-1700, so a batch job running at 1900 can update the dataset. This could work, though I'd be worried about this 'land mine.' If that CICS application expanded its hours to handle different regions, then there's an issue.
I've seen a few sites that use VSAM datasets as logs. Errors and diagnostic information are written here, and can be read online with tools like IBM FileManager and friends. If the VSAM files are KSDS, this will probably be OK.
Some sites may simply perform reporting from VSAM data. If the results of this reporting do not need to be 100% perfect, then accessing VSAM datasets may be OK. For example, an end of day report that only needs updates performed up to 5pm can probably access a VSAM KSDS shortly after this.
The issue is that this introduces complexity that makes supporting these reports harder. What happens if the trading day changes or increases? Or the application moves to 24x7 operations? What if a change in compliance regulations means that these reports must be more accurate? Or what if some reports can tolerate a small error, while others cannot?
This isn't a new problem, and there are a lot of solutions available. By far the best would be VSAM Record Level Sharing (RLS). It also offers the options read integrity (covered in another partner article) to avoid reading an uncommitted update (with a performance penalty). If you're updating VSAM datasets in a unit of work, you'll want to use VSAM Transactional VSAM (tvs).
For backups, DFDSS (DFSMSdss) can use CICS concurrent copy and CICS backup while open (BWO) to get a good backup of VSAM datasets. If your VSAM datasets are RLS, DFDSS automatically supports RLS, and ensures data integrity.
Some program products like IBM CICS VR perform backup and forward recovery services. So they can recover from a fuzzy backup, and 'roll forward' updates from logs.
Specifying SHAREOPTIONS(4,3) will resolve many of the integrity problems. However, this comes at a very large performance cost, and is rarely used. This also doesn't guarantee integrity for write operations.
Many sites avoid this problem by using a single process to perform all dataset accesses: often CICS. So batch jobs needing to access VSAM dataset may execute a CICS program using something like EXCI or the MQ CICS 3270 bridge. I've also seen sites use MQ as a go-between. Records to be added to a VSAM file are PUT onto an MQ queue, that a CICS transaction processes.
Accessing a file that is open by another address space is risky. Whatever the requirements or needs, there are usually many different issues and problems to consider and resolve.
IBM have provided features such as VSAM RLS and DFSMStvs to address these problems. Alternatively, simply using CICS or a similar address space to perform all VSAM operations is the safest option.