management: How Fast is Your Mainframe?
A while ago I wrote an article talking about the size of a mainframe ‐ its processing capacity. Processing capacity is really the amount of work a mainframe can do. This depends on many things, including the amount of memory, disk I/O speeds and networking speeds. But the truth is that the overwhelming factor limiting the work a mainframe can do is processor capacity. Processor capacity in turn is determined by the number of processors; chosen when you buy a mainframe. Processor capacity is also determined by the processor speed. So how fast is your mainframe?
Computer scientists will tell you that the speed of a processor depends on the clock speed. A clock is a timer that regulates all the different processor operations, keeping them in time. If you've ever seen rowing, you'll know that rowers need to keep in time. So there's one person at the back calling the time ‐ the coxswain. The clock is this coxswain for processors.
So, the faster the coxswain calls the time, the faster the boat goes. In the same way, the faster the clock speed, the faster the processor. Different processors over the years have increased the speed of their clock. Here are some of the more modern ones:
IBM z12: 5.5 GHz
IBM z13: 5GHz
IBM POWER8: 2.5‐4.5 GHz
Intel i7-6700: 4GHz
This looks interesting: an Intel i7-6700 is almost as fast as an IBM z13, indicating that it is almost as powerful. But of course, nothing is this simple. The clock speed is only one part of the equation.
RISC and Instruction Sets
Processors execute instructions. So the higher the clock speed, the more instructions it can process in a set period. If you're writing a program, the program code will be compiled or interpreted: broken down into the instructions supported by the processor.
However different processors have different instruction sets. Some processors have a small number of supported instructions ‐ Reduced Instruction Set Computing (RISC). This simplifies the processor, and often makes it faster. The idea is that a program will need to execute more instructions, but each will be fast, with an overall performance benefit. The IBM POWER8 is a RISC processor.
Other processors have a large number of instructions: Complex Instruction Set Computing (CISC). The argument here is to take the load off the applications, and put it into the processor. The IBM z12 and z13 processors are in this camp. So you'd expect a RISC 5GHz processor to be less powerful than a CISC 5GHz processor. But of course, it's still not that simple.
Processor instructions often have to do things like access memory. So an instruction may have to wait for a cycle or two while an area of memory is located. Other instructions may perform several memory operations at the same time, or use some other feature that it must wait on. So one instruction may take one or more processor cycles.
The bottom line is that two processors running at the same speed may or may not take the same time to process the same workloads. This is one of the reasons that comparing processors has always been difficult. The Whetstone and Drhystone tests were designed to get around this problem. However even these tests are not perfect, as they only test specific processor functions (floating point and integer /string respectively).
When the IBM z12 was released in 2012, it boasted the fastest microprocessor in the world at 5.5GHz. In 2015, IBM released the z13, advertised as 10% faster than the z12. But the z13's clock speed was 5GHz: slower than the z12. How could this be?
The answer is that IBM and other processor manufacturers use other ways to squeeze more performance from their processors. One of the tricks in their bag is to do more than one thing at the same time.
IBM mainframe processors, as well as many other processors, perform superscalar processing. This means that a single processor unit is divided into different functional units that can each do work at the same time. IBM mainframes have functional units for floating point and binary arithmetic, instruction decoding, data cache management, and data compression/encryption. This allows a single processor to execute more than one instruction in a single clock cycle. In fact the z13 can perform up to 10.
Processors need to get things from memory. They need to fetch instructions from memory to know what to do. They often need to get data from memory, or put it back. This action of accessing memory costs time. So an instruction accessing memory may need to wait for a few processor cycles. Normally, nothing else can be done while the instruction is waiting.
One way to reduce this wait time is to use cache. So if an instruction needs a word from memory, the processor may get three words and keep them in cache. If the next instruction needs one of these words, it doesn't need to search through all the gigabytes of main memory ‐ it's already in the smaller, super-fast cache near the processor. The z13 actually has a four-layered cache structure to maximise the benefit of this cache.
Another trick is out of order execution. Suppose there are five instructions waiting to run on a processor. However one of these instructions must wait to access memory. Normally a processor will execute these instructions in order ‐ so it would need to wait for that long-running instruction. With out of order execution, it can skip over the waiting instruction to execute other instructions while that slow instruction waits ‐ providing that doesn't break anything. z/OS includes a feature called Hiperdispatch to improve the effectiveness of out of order execution.
An interesting side-effect of caching and out of order execution is that some workloads will perform better than others. This means that some workloads (usually batch, low I/O, single application, high CPU) will get more out of these features than others (online, high I/O, multiple application, low CPU). So one application "mix" may use as much as 20% less or more CPU than another ‐ even if they both execute equivalent numbers of instructions.
z/OS can be configured to record the "efficiency" of the current workload (called the Relative Nest Intensity, or RNI). This can be important when performing capacity planning or modelling.
Speed Isn't Everything
The processing power of any computer is mostly determined by its processor capacity: how many processors, and how much work each can do. In the past this amount of work was determined (or limited) by the processor speed. However processor technology today improves processing power in other ways, with some interesting results.