technical: How to Use RNI
Relative Nest Intensity (RNI) is the new buzzword amongst anyone working in z/OS performance and capacity. This figure is used to show the relative efficiency of a workload ‐ how much benefit it gets from processor features such as caching and out of order execution. But is it really useful?
IBM z Systems mainframe rely heavily on cache to speed up memory access. The z13 has four levels of cache, each one further away from the processor, and hence slower. The further away the processor must go to find data, the more expensive it is ‐ the more time the instruction has to wait. Features such as out of order execution and z/OS Hiperbatch are designed to reduce this wait time. But the overhead doesn't go away.
Of the four cache levels, the two farthest from the processor (and closest to main memory - L3 and L4) together with main memory make up the "Nest". This area is the most sensitive to processor performance. Or in other words, the overhead of going to L2 rather than L1 cache is small, but the overhead of going to L4 rather than L1 is much higher. The Relative Nest Intensity (RNI) is a measure of how far out, on average, the processor must go to reach the data it needs. It's a number calculated using a formula based on the number of cache misses. This formula is different for each processor architecture, which makes sense when you remember that the number of cache levels has changed over the past few years.
For the z13,
RNI = 2.6x(0.4xL3P+1.6xL4LP+3.5xL4RP+7.5xMEMP)/100
MEMP = percentage of Level 1 cache misses sourced from memory
L3P = percentage of L1 misses sourced from the shared chip-level L3 cache
L4LP = percentage of L1 misses sourced from the local book L4 cache
L4RP = percentage of L1 misses sourced from a remote book L4 cache
The formula looks complicated at first glance, but take a closer look. We can see that the higher the number of cache misses, the higher the RNI. So high RNI, reduced performance; low RNI, improved performance. The cache misses are also weighted. L1 misses sourced from L3 cache are multiplied by 0.4. However L1 misses sourced from a remote book L4 (farthest from the processor) are multiplied by 3.5 ‐ over seven times more expensive.
The RNI can be recorded in SMF type 113 records, so you can see exactly the RNI of your workloads in each z/OS system over time. Well, this is all great. But how can we use this number?
IBM provide a "Large Systems Processor Reference" (LSPR) with a list of each processor going back to the zSeries 800. This also lists the relative capacity of each processor in MSU ‐ a z13 2094-525 has a capacity of 1603 MSUs. The problem is that how much work that this capacity can actually do depends on the workload. Because of processor design, some workloads (usually batch, low I/O, CPU intensive) run more efficiently than others (usually online, high I/O). So if you're looking to buy a new processor, you need to know the efficiency of your workload to see how large a processor you need ‐ the MSU capacity alone isn't enough.
For many years, IBM provided several categories that you could choose from. For example: ODE-B (On Demand Environment ‐ Batch), IMS, OLTP-T (Online Transaction Processing ‐ Traditional) and LolO-mix(Low DASD I/O, mixed workload). The capacity differences between these workloads was significant. For example, a 2097-730 had a capacity of 22,000 MIPS for ODE-B, but only 16,000 for OLTP-T. There were some "rules of thumb" to help choose the best mix for you. However at best this was an estimate.
What's more, the performance difference between workload mixes has increased with more recent mainframe architectures. So today you need a more precise measurement of your workload mix to properly determine the processor you need. Enter RNI.
Starting with the z10 mainframe, it's now possible to configure the mainframe and z/OS to record the RNI in SMF type 113 records. This is great as you can measure the efficiency of your workload, rather than estimating it. You'll probably see this change over time ‐ particularly if you have a traditional workload setup with online during the day, and batch during the night.
With this RNI, you can use IBMs zPCR tool to determine accurately how your workload will fare on different processors. IBM make this even easier with the CP3EXTR program that will read your SMF records for you, and import them into zPCR. Brilliant.
How would you like some CPU for free? Improving (i.e. reducing) your RNI does exactly that. It reduces the CPU needed to perform the same work. Now, the RNI depends on a few different factors. Some of these are just what they are ‐ we can't change them. For example, application type (batch beats transactional), the number of applications running (the less, the better) and data reference pattern. However there are some things that we can change to improve our RNI.
Take I/O for example. Systems with a lower I/O rate will get a better RNI. So tuning systems to minimize I/O with better buffering, LSR, I/O avoidance and more not only saves CPU from the I/O, it also improves your RNI-related efficiency.
LPAR configuration is another area that's interesting. The less LPARs you have on a processor, the better your RNI. If you're running z/VM guests, reducing the number of guests will similarly improve RNI. Reducing the number of logical processors defined to each LPAR/guest will also help. Finally, reducing the number of address spaces (CICS regions, JES initiators, DB2 SPAS, Websphere Application Server servants) will improve your RNI as there are less address spaces competing for processor cache.
To give an indication of how much this can improve things, MVS Solutions' John Baker detailed a test at the 2015 CMG conference showing how reducing batch initiators from 300 to around 25 (dynamically managed by their Thruput Manager product) reduced elapsed time from 10 to 8.5 hours.
The True Value of RNI
So RNI is essential for accurate capacity planning. But perhaps of more interest, RNI can be used to improve your CPU efficiency, reducing your MSU usage. Recording SMF Type 113 records provides a way of accurately measuring and tracking RNI.
Updated 20-Nov-2016: Thanks to Marc van der Meer for pointing out that MEMP is the percentage of L1 misses sourced from memory