opinion: Two Opposing Strategies for Systems Administration: Do Nothing or Do Something
Is it really better to minimise change, and do nothing?
Every two years in September, IBM releases a new version of z/OS and its accompanying features. Every two years, IBM provides a list of new features: they've recently done this again for z/OS 3.1.
Many sites won't jump to implement every new feature immediately: and that sound good. I like to let new features 'settle down' before implementing them: others can find the bugs.
But IBM isn't releasing these features for fun. They'll have done research. They'll have seen that these features are wanted by z/OS users, needed to allow IBM to continue marketing z/OS as the most secure and stable operating system, or desired to reduce reasons to move away from z/OS.
So, z/OS administrators have a choice: implement the new feature, or not. When making this choice, they'll weigh up factors like the risk (how new, impact if it fails, other sites' experiences), benefits, and how hard it is to implement. This last "how hard' isn't just the technical issues. Other changes may be needed, there may be bugs or issues, and education and assistance may be needed for other groups.
In my experience, when faced with these choices z/OS administrators fall into two camps:
- Do Nothing: make the minimal changes to z/OS. Don't implement any new features or make any changes unless absolutely necessary.
- Do Something: enable appropriate features. Maybe not every feature but features that make sense. Features that may not provide immediate benefit but are worth it. Features that users may not be asking for but will enjoy once they're available.
I can understand the "Do Nothing" group. Change introduces risk: no one wants to put their job on the line if they don't have to. Change is hard: learning new features, discussing them with other groups, working to resolve issues. And change still costs: in time to research, prepare, configure, enable, and test the new feature.
However, I'm not a big fan of "Do Nothing." z/OS systems don't operate in a bubble. They're connected to other systems, the hardware is regularly upgraded, as is the software. New features are being used, whether z/OS administrators like it or not. NFS for file sharing is becoming mandatory, z/OSMF is needed for z/OS administration, and Zowe is rapidly becoming a thing.
I believe that "Do Nothing" impacts security: mainframes are becoming more interesting to hackers and other malicious users. Let me give you an example: SDSF security.
SDSF for many years has supported RACF and other security software (Broadcom ACF2, Broadcom TSS). However, I still see sites using the old ISFPARMs for SDSF security. This isn't a great idea: security administrators are unlikely to be familiar with ISFPARMS. It is likely that z/OS systems programmers are managing ISFPARMS. If you're a security geek, this contravenes STIG V-224511.
But it gets worse. Today these users have a problem: z/OS 2.5 doesn't support ISFPARMs. Rather than a relaxed, slow implementation of RACF security, they're rushing to get it done before z/OS 2.4 is out of support next September.
I believe that "Do Nothing" impacts resilience and uptime. For example, I often see sites that don't have an AUTOIPL policy. This policy will automatically IPL a z/OS system if it goes into a disabled wait: or in other words, if it dies. AUTOIPL can be configured to automatically perform a standalone dump before IPLing: essential diagnosis evidence. If a system does die, the chances are that there will be some panic. If it's at 3am, there's a good chance that operations staff will want systems programming advice before doing anything. Or just as likely, they will decide to just IPL without a standalone dump. So, our downtime will be longer (we need to wait for someone to see the disabled wait, decide what to do, and do it), and may not get the diagnostic data we need. IBM recommend an AUTOIPL policy: they even have a z/OS Health Checker check for it.
I believe "Do Nothing" costs money: new features that reduce CPU will not be implemented. Products that become obsolete with new features will not be retired. More hardware resources may be purchased as performance enhancing features are not implemented.
Finally, I believe that "Do Nothing" produces less effective staff. Staff will have less experience and knowledge as they're doing less. Quality staff are likely to move somewhere else because it's boring to maintain the same. "Do Nothing" will create a culture of being 'afraid' of z/OS and change.
I don't believe that every feature should be immediately implemented. I do believe that new features should be reviewed, and those appropriate should be implemented according to a sensible schedule. I believe in "Do Something."