technical: Does the Latest Enterprise COBOL Compiler Eliminate Coding Performance Errors?
There have been a lot of changes to the IBM Enterprise COBOL compiler over the past five years. These include performance-related changes. In fact, IBM thinks that these changes are so significant, those with old COBOL code can buy the product "IBM Automatic Binary Optimizer for z/OS" to get these benefits with recompiling that old code.
So, here's a question: do we still need to think about performance when programming in COBOL, or does the latest Enterprise COBOL forgive everything?
Enterprise COBOL Advantages
In 2016, Enterprise COBOL 6.1 was announced. Since then, two new Enterprise COBOL compiler versions (6.2 and 6.3) have followed. So, what changes have been introduced with these new compilers? Let's summarize them:
- New instructions: using the ARCH instruction, these new compilers can leverage all new instructions features of the latest processors.
- Code optimisation: COBOL can perform code optimisation if the new OPTIMIZE option is coded.
- Compile and runtime performance enhancements. The COBOL compiler can also leverage new features of the latest z/OS.
- New functions: these improve processing of XML, JSON and UTF-8 data.
There's a lot of good news for COBOL programmers.
In the past, smart COBOL programmers have kept performance in mind when writing code: using inline PERFORM statements, avoiding using more than 18 digits for numbers, and using the best picture statements. But can we now ignore these, and let the new Enterprise COBOL do its thing? Let's look at a test case.
A Test Case: USAGE
In 2016, I presented at Share a case where COBOL programming cost a lot of CPU seconds. In this case, the client was allowing the Picture USAGE statement to default to DISPLAY, rather than specifying COMP: a classic error. Let's look at some example code to show the problem:
Working-Storage Section.
01 WS-LEN-FOUND-SW Pic 9(01) Value 0.
88 WS-NOT-LEN-FOUND Value 0.
88 WS-LEN-FOUND Value 1.
01 WS-WORK-LEN Pic 9(03).
01 WS-WORK-LINE Pic X(256) Value Spaces.
WS-LEN-FOUND-SW and WS-WORK-LEN don't specify a value for USAGE, so they default to DISPLAY. Or in other words, COBOL stores the numbers as EBCDIC character values, ready to display. If we have this code:
Move Length Of WS-WORK-LINE To WS-WORK-LEN
Set WS-NOT-LEN-FOUND To TRUE
Perform Until WS-LEN-FOUND
If WS-WORK-LINE (WS-WORK-LEN:1) = SPACES
Subtract 1 From WS-WORK-LEN
If WS-WORK-LEN = 0
Set WS-LEN-FOUND To TRUE
End-If
Else
Set WS-LEN-FOUND To TRUE
End-If
End-Perform
COBOL will need to convert WS-WORK-LEN from EBCDIC to a number, and back again whenever we do any math function on it. To prove this, we created a program to perform this processing one million times. We compiled it with Enterprise COBOL 6.3 (default ARCH and OPTIMIZE parameters), and ran it in a batch job. The result: it took the program a little over 4 seconds, and used 4.2 CPU seconds. Ouch!
Using the LIST compiler option, we can see the assembler code that the compiler creates. Let's look at the assembler code that is created by the COBOL compiler for two COBOL lines working with WS-WORK-LEN:
Subtract 1 From WS-WORK-LEN
PKA 376(R13),160(3,R9)
SP 390(2,R13),469(1,R3)
UNPK 160(3,R9),390(2,R13)
OI 162(,R9),X'F0'
If WS-WORK-LEN = 0
PKA 392(R13),160(3,R9)
CLC 406(2,R13),458(R3)
JNE L0010
To do something simple like subtracting 1 or comparing the value with zero, the compiler has coded pack (PKA) and unpack (UNPK) instructions. These convert WS-WORK-LEN between EBCDIC and numeric.
Better Coding
To resolve this performance issue, we would normally add the COMP parameter to our numeric picture definitions:
Working-Storage Section.
01 WS-LEN-FOUND-SW Pic 9(01) Comp Value 0.
88 WS-NOT-LEN-FOUND Value 0.
88 WS-LEN-FOUND Value 1.
01 WS-WORK-LEN Pic 9(03) Comp.
01 WS-WORK-LINE Pic X(256) Value Spaces.
When we run the same code, it now takes less than a second to execute, and uses only 0.7 CPU seconds. A big reduction.
Now, here's the question: do we still need to do this? Or does the compiler figure it out for us?
ARCH Option
By default, the Enterprise COBOL ARCH option is set to 8 (as if the code was executing on a z10 mainframe). But there have been many new instructions added since then. Our test machine is a z14, so we compile the original code (no COMP parameter) with the ARCH(12) compile option. Here's the comparable assembler listing:
Subtract 1 From WS-WORK-LEN
VPKZ VRF16,160(,R9),0x2
VLIP VRF17,0x1,0
VSP VRF16,VRF16,VRF17,0x3,1
VUPKZ VRF16,160(,R9),0x2
If WS-WORK-LEN = 0
VPKZ VRF16,160(,R9),0x2
VLIP VRF17,0x0,0
VCP VRF16,VRF17,8
JNE L0010
The COBOL compiler is now using the fancy vector decimal-arithmetic instructions introduced with the z13. However, the results don't change: still takes over four seconds to run, and uses 4.2 CPU seconds.
OPTIMIZE Option
Another feature introduced in the past few years is the OPTIMIZE option, that tells the compiler how much optimisation it should perform on the code. By default, this is OPTIMIZE(0): the minimum. Let's recompile with OPTIMIZE(2) to maximize optimization. Here's the assembler code generated:
Subtract 1 From WS-WORK-LEN
SP 390(2,R13),522(1,R3)
UNPK 160(3,R9),390(2,R13)
OI 162(,R9),X'F0'
If WS-WORK-LEN = 0
CLC 160(3,R9),506(R3)
JNE L0011
This looks promising: still one unpack (UNPK) instruction, but far less. And the results are a small improvement: takes between 3 and 4 seconds to run, and uses 3.7 CPU seconds.
ARCH and OPTIMIZE Together
Let's use both ARCH(12) and OPTIMIZE(2) together. Above, we only showed the assembler code generated for two COBOL statements. However, with these options, the entire PERFORM code has been reduced to:
Perform Until WS-LEN-FOUND
VPKZ VRF16,152(,R9),0x0
VLIP VRF17,0x1,0
VCP VRF16,VRF17,0
VPKZ VRF16,424(,R9),0x5
And the results are the best so far: the code takes less than three seconds to run, and uses 2.3 CPU seconds.
Conclusion
When looking at COBOL performance, the COBOL Performance and Tuning Guide is the place to turn to. In the section "Coding Techniques To Get The Most Out Of V6" it has an entire section on DISPLAY, and it confirms what we've seen here: avoid USAGE DISPLAY when performing mathematical functions is still the way to go. However, the compiler can reduce the overhead of USAGE DISPLAY with the right compile options.
In this article, we've only looked at one small corner of COBOL and performance. And in this corner, smart COBOL programming is still the go. If you browse through the 6.3 version of the Performance and Tuning Guide, you'll see other areas that are similar: smart programming will still improve performance, but less if using the optimal compiler options.
Bottom line: smart COBOL programmers will still think of performance when coding.
David Stephens
|