opinion: Do We Really Need to Compress Datasets?
In many sites that I work, I see my clients compressing datasets. And they do this for two reasons. Some compress datasets before transmitting them
somewhere else - and this makes sense. Others automatically compress datasets on disk. This doesn't make as much sense.
Many years ago, DASD space was expensive, and CPU seconds were cheap. So many sites decided to save money by compressing some of their disk datasets
and databases using products such as Shrink/MVS (now CA Compress), Infotel InfoPak, or BMC Data Accelerator. And these products were great. They would
transparently compress datasets as they were written to disk, and decompress them as they were read. No change needed to any application or JCL.
More recently, IBM has added this functionality to z/OS without the need for ISV products. Systems programmers can setup data classes to automatically
compress VSAM/E and sequential datasets.
IBM and other vendors also released products to automatically compress inactive datasets. So DFHSM (now DFSMShsm) would automatically move inactive
datasets to disk volumes (called Migration Level 1, or ML1) as compressed files, or even records in a VSAM dataset for very small datasets. Other
products such as Innovation's FDRARC can do the same.
However all this compression comes at a cost, firstly in terms of increased dataset access times. In the old days, data compression would reduce
I/O times, compensating for the processing time required for compression and decompression. However with today's sophisticated disk subsystems, this is
less of an issue. The good news is that processors are also faster, so the overhead is less than it used to be. So unless the compressed dataset is
heavily used, the chances are that you won't notice too much performance degradation.
However the big cost is in CPU seconds. The CPU overhead for this compression and decompression can be high. I've seen it double the CPU usage in
some cases. Now this is OK when disk is expensive and CPU seconds are cheap. However today, the exact opposite is true. CPU seconds are expensive,
driving up your software licensing costs. On the other hand, DASD costs have plummeted over the past years, diluting any cost justification for compression.
The benefits of compression may also be less than you think. I've always used a rough rule of thumb of 50%. Meaning a compressed dataset will, on
average, be half the size of a similar uncompressed dataset. Of course compression values will vary from dataset to dataset.
So let's look at HSM. Today, I cannot see any reason to continue to use DFHSM ML1. Datasets should remain on disk, or be migrated straight to tape.
You'll need to keep a couple of ML1 volumes for ad-hoc backups and CDS backups. However that's all they should be used for.
Similarly, in most cases datasets don't need to be compressed. There may be some cases where it makes sense, but they will be in the minority. Of
course moving away from compression isn't a simple task. In most cases, more disks will need to be acquired, and storage groups and other configurations
will need to be changed. But in the long run, this will pay for itself in reduced CPU usage. You may also see some performance improvements to sweeten
the pot.
David Stephens
|