LongEx Mainframe Quarterly - November 2021
At one site I was working, there was little change management. I could basically make any change I wanted, no questions asked. However, with a change in management, change management was introduced. I now had to 'package' up my changes, create batch jobs to implement them, justify the reason for the change, and have someone review it. And something interesting happened: my changes were better. Fewer issues, better tested and reviewed, safer. But can change management go too far? Change management is essential for managing any modifications to a computing system or application. This management ensures that the change is required, has been sufficiently tested and reviewed, and does not impact other changes or business requirements. However, I often see change management processes and procedures that are, well, too difficult. For example, in one site changes must be made at least 14 days before the change. They must be reviewed and approved by several managers: some of whom do not understand the technical nature of the change, or what it does. Changes are often rejected because an approver did not understand the change, and thought it impacted something it didn't. Staff proposing the change often must attend meetings where many changes are reviewed, in case there are questions about the change. So, a staff member may wait for an hour on a call, in case they need to answer a question for one minute. What's more, changes are not approved until one or two days before the change. This makes it difficult for teams to schedule staff to implement and monitor the change, and obtain resources from business units to validate function after the change is implemented. Such 'heavy' change management procedures aren't unusual. Sometimes they are 'knee jerk' reactions to outages: a quick option that is seen to address an outage that occurred, and prevent it from happening again. Other times, they are an evolution over time for changes from many different groups. The problem with heavy change management procedures is that it could do the exact opposite of what is intended: they could impact resilience. But how? Any computer system must be maintained. Changes are always required: from security rules to fixes that stop crashes and abends. In many cases, application groups will know about problems, and have fixes ready for them. New hardware will often require software changes. A heavy change management system will slow these changes down. This could have a few effects:
A heavy change management system can also affect the culture of an organisation. If changes are very difficult to get approved, it indicates that the organisation doesn't accept risk. So, changes that have an element of risk may not be permitted. For example, at one site a change was requested, but rejected as it had not been tested in a test environment. However, there was no test environment available. The proposal was to 'try' this low risk change in production, and during the outage window validate that it was successful. If not, there was sufficient time in the change window to back it out. The change was not approved, and was never implemented. So, what am I saying? Change management is essential to managing resilience and reducing risk from changes. However, I believe that resilience benefits change with the 'weight' of change management procedures, and this change is something like a bell curve: Resilience benefits will increase at first, but as the weight of the change management procedures increases, resilience will decrease. Ideally, we want to find the 'sweet spot' where the resilience benefits are maximized. The reality is that maintaining computer systems involves some risk. Hardware can glitch, software products have bugs, staff make mistakes, and programmers are not perfect. There is never a way to guarantee 100% success, 0% problems. The aim of change management should be to minimise risk, while allowing change and development to continue as smoothly as possible with the minimum amount of red tape. |