technical: Why Your Batch is Running Slow
Batch continues to be a critical part of most z/OS workloads. However most of our performance-related tuning is based on CPU consumption and online response times. It's only when a batch stream runs outside of its window that batch gets its share of performance work. This month, I'm giving my seven top reasons why I find batch jobs run slowly, and what you can do about it.
7. Waiting for Dataset Recalls
Often jobs, particularly those that run weekly, monthly or quarterly, will hang around waiting for dataset recalls from DFSMShsm or similar products. To my mind this is classic wasted time that can, and should be avoided. The obvious answer is to manually recall datasets an hour or two before the beginning of the batch run. If this is too hard, there are a few products that can help you out, automatically recalling migrated datasets before a schedule starts. These include OpenTech Systems' HSM/Advanced Recall and DTS Software's MON-PRECALL.
One pet-hate I have is when batch jobs recall datasets to delete them. Usually they use IEFBR14 to allocate the dataset with a disposition of (MOD,DELETE). The z/OS ALLOCxx parmlib option IEFBR14_DELMIGDS(NORECALL) will save this pain, and convert the HSM recall to an HDELETE. If this can't be done, using the IDCAMS DELETE command will do the same job.
6. Bad Choice of Utility
Far too many times I see sites copying between datasets using IDCAMS REPRO or IBMs standard IEBGENER. These utilities use default dataset buffering, which is rarely good. A far better option that guarantees optimal buffering is probably already installed in your site: the IEBGENER replacements DFSORT ICEGENER and SyncSort BETRGENER. Everyone knows that these can copy sequential (QSAM) files. Less know that they can also handle VSAM.
If installed on your disk subsystems, Snapshot or similar features are another great option: instantly copying datasets with minimal CPU overhead. Similarly, I think you would be crazy to use IDCAMS EXPORT/IMPORT when backing up or defragmenting VSAM datasets. DFSMSdss, CA Faver and Innovation FDR/ABR can do much better.
5. Waiting for CICS Files
A very standard scenario is an overnight batch run that updates CICS VSAM files after the day's trading. Unfortunately, many still manually close these files in CICS, and then start their batch run. If you absolutely need a clean point to do this, then OK. But the chances are that you don't. You have a couple of choices to improve here. You could use CICS EXCI to call a CICS program to access the CICS/VSAM file from batch, or you could use features like DFSMSdss BWO features to backup CICS/VSAM files with data integrity. VSAM RLS and DFSMStvs are other options. There are also products such as DSI Document Systems' BatchCICS-Connect and MacKinney Batch to CICS that can help.
4. Waiting for Datasets
Of course there will often be several jobs that need access to the same datasets. So accessing them with a DISP=OLD, and waiting for previous jobs seems obvious, and in the past mandatory. But there are smarter options.
If you need concurrent access to VSAM datasets, features like VSAM RLS can allow multiple jobs to access datasets at the same time. Hiperbatch may also help when multiple jobs need to read a dataset at the same time, and BatchPipes can have a writer and reader running concurrently. Software products such as BMC Mainview Batch Optimizer can also help here.
3. Waiting for Initiators
In the old days, we used initiators to limit the number of batch jobs executing to manage CPU usage. In many shops, this is still the case. However today we have a far better tool: WLM. In fact today, everyone should be using WLM managed initiators to manage the number of initiators based on workload, rather than setting hard limits.
2. Waiting for Tape Drives
I remember a site where we only had eight 3480 tape drives shared between two z/OS systems. If a tape drive was onine on System A and needed on System B, it would first need to be taken offline from System A, and then brought online to System B before it could be used.
Those days are history, and dynamic tape sharing is the norm. And obsolete. Today Virtual Tape Subsystems (VTS) can define hundreds of logical tape drives to a z/OS system. Even if you can't afford a VTS subsystem, you can achieve similar things with DFSMShsm. Today there is no reason for a job to wait for a tape drive.
1. Dataset Buffering
Whenever I'm tuning a batch job, the first thing I look for is dataset buffering. And the pickings are usually rich. It's as if IBM deliberately set default dataset buffering values for the worst performance. So if you're not setting BUFNO for QSAM, or BUFND/BUFNI for VSAM in your batch, you probably should be. Of course modifying DD statements for every batch job is a huge task. Using VSAM System Managed Buffering (SMB) is one excellent option. There are also many products out there that do it for you: from CA Hyper-Buf to Rocket's Performance Essential.
What Isn't In My List
In many ways, what isn't in my list is as interesting as what is. Dataset compression doesn't make it: it will increase your CPU, but probably not your batch response. Similarly features like dataset striping are great for online performance, but I haven't yet seen them make a big difference to batch times. VSAM tuning and QSAM blocksize selection can make a big change to your run times, but most sites are on top of this. Batch scheduling is often a cause of delays, but automated scheduling packages and tools that most sites use will help overcome this. And there are a range of other things such as database and SQL tuning. However from my experience, my top seven above will give you the most batch elapsed time reductions for the least effort.