opinion: Can Patrolling of Source Code Improve Resilience?
Last winter, I went out for a run. Not particularly interesting, except that it was -5°C (23°F): it was COLD. Snow lying around, a bit of wind, and few others brave (or silly) enough to go out and run on such a morning. As I left, my mother-in-law looked at me as if I was crazy: "are you sure you're going to be warm enough?"
When I first spent a winter in New York City, I had the same attitude. I'd stay inside as much as I could, and would wear ridiculous amounts of clothes when I went outside. I was a little afraid of the cold.
I run for the physical and emotional benefits: I'm fitter and less stressed. But there's another interesting benefit of running in the cold: I'm no longer afraid of it. I wear less clothes than I used to, and have no problems going out. No longer do I think "do I really need to go out and get milk, or can I survive without it a little longer." I just go and get the milk.
When I was in the army, they had a similar attitude. Everyone in the army had to regularly do 'Infantry Minor Tactics:' practice patrolling in the forest. The idea was that if you're patrolling, you're not sitting in your trenches waiting for trouble to come your way. You are taking the initiative, and becoming more familiar with where you are. Your mindset is more positive, and you're more effective.
I believe that this is the same for any area, including computing. Let's take an example - something I've seen a few times. Suppose we have a module that is essential to our application: MODA. If MODA is regularly modified and maintained, then the teams supporting it will be familiar and comfortable with it. They will understand how it works, and quickly identify problems that related to it. If there are problems, they can be quickly fixed. If a change to MODA is needed, it is scheduled and done.
However, suppose MODA hasn't been touched in 20 years: it hasn't been changed, or even recompiled since Y2K. Make it a little complicated. Make it a large module, or even a group of modules. The last people to develop or maintain it have left. Someone tried to make a change to it 5 years ago, and failed (and the change was discarded).
Now the support team is afraid of MODA. They won't want to touch it; in case they make a mistake. No one is 100% sure of what it does, or how it works. If there is a problem with MODA, support teams will try to 'live with' the problem, rather than change MODA. Other programs may be changed to tolerate incorrect output from MODA. Maybe they include code to retry calls to MODA a few times if the module fails. Maybe they are serialized so only one program calls MODA at a time: there are errors otherwise. If the business requires a change to MODA, the support team will resist, or at the very least, postpone the change as long as possible.
If MODA uses a lot of CPU, this will be tolerated. If the latest version of a product cannot work with MODA, an old copy of the product may be retained as it 'works for now.'
This is a resilience risk. This is a business risk. This is not good. And I've seen it in many sites.
Maintaining computer systems involves some risk. In the mainframe world, we are lucky to have a very resilient platform that experiences fewer crashes and problems than other platforms. However, in many cases sites have become too 'risk averse', paradoxically increasing risk.
I believe that application teams should be encouraged to 'take charge' of their modules. They should be encouraged to review modules, and even recompile them regularly. They should be empowered to make sensible changes to modules, even if there are no immediate business requirements. Application staff should have enough resources and time to regularly review modules, and target those that have not been recompiled or reviewed for some time.
Allowing staff to 'patrol' their application will give them the skills and confidence they need to maintain the application, and make it do everything needed efficiently and effectively. Or in other words, with minimum risk.