opinion: Why Every Site Needs a Few Good Dashboards
As a consultant, I'm dropped into a mainframe site to achieve a goal, and then leave. Often this goal involves performance: making CICS transactions run faster, batch jobs end sooner, started tasks use less CPU. To do this, I'll look at SMF and other records to figure out what's happening: this guides me to find areas I can improve.
But here's the thing. I usually make my own performance reports. I go back to the SMF records, and create the charts and tables I want. I don't use reports, charts, or other data created by my client. I've never used a client's dashboards. And that's interesting.
Performance Reporting My Clients Have
Most of my clients produce performance reports. They have some showing CPU usage of their z/OS systems. Many have reports highlighting the high CPU users, and any changes in CPU usage. Some have reports with key performance indicators (KPIs) of critical applications and infrastructure: usually response times and transaction volumes.
Most will have a performance database (PDB): a history of summarized SMF data that capacity and performance staff can use to quickly find things out.
Few are without monitoring tools that show information about current performance and CPU usage.
In most cases, these reports are designed for, and available to, technical staff. Few of my clients have near real-time performance and CPU usage reports available to everyone.
What Reports Do We Need?
OK, but do we really need all these technical reports? Building dashboards with all this information takes time and effort. And this effort doesn't stop when the dashboards are built: like any other application or system, they must be maintained. Often, these reports aren't understood by management and non-mainframe staff. They won't know that 90% CPU usage on a mainframe is fine. Or that a CICS transaction that has been running for an hour could be OK if it's a background transaction.
If we start making this information available to everyone, technical staff will be inundated with 'false positives', and people will see problems where there aren't any.
Surely, we can just go back to those PDBs and get the information we need, when we need it. And there are systems programmers that strongly believe all this: it certainly makes their lives easier.
I don't believe this. I strongly believe that every site should have near-real time dashboards that are available to anyone and everyone. This will help user and technical staff quickly identify and begin investigating problems. It will help capacity planners look for growth patterns without the hours of generating bespoke reports. It will show performance staff the performance that is happening now, and past performance.
But there's another advantage. I believe that many are pushing to retire their mainframe because they don't understand it. The mainframe is this mysterious black box that they only hear about when there's an invoice to pay, a problem occurs, or there's something needed from the mainframe that it may, or may not, be able to provide.
Dashboards lift the curtain on the mainframe, making it seem more real. Dashboards can show users that the mainframe is processing a lot of transactions, and doing it fast. Dashboards can show that the mainframe often has less problems than other platforms. Dashboards can help people make informed decisions about mainframe modernization, and options to take.
It's Not Hard To Create Dashboards
Today, creating these dashboards is not difficult. Tools such as Splunk are designed to quickly build charts, tables and dashboards. They can process near real-time mainframe data with help from tools like Precisely Ironstream or IBM CDP. Many monitoring tools like Broadcom CA SYSVIEW have features to send data to Splunk.
I'm not suggesting that every site needs hundreds of dashboards showing all the detail of their z/OS systems. In fact, too many dashboards can also be a problem: too much information, too difficult and expensive to develop and maintain.
A few carefully designed dashboards will often be enough. We talk about one idea to quickly show the performance 'health' of a z/OS system in a previous article.
I believe that every site should have a small set of dashboards showing near real-time information about their z/OS systems: and these should be accessible by anyone in the organisation. These will also make my life as a consultant easier.