technical: Improving REXX Performance
REXX is sometimes seen as a forgotten issue. Word on the street is that IBM is not doing any more development on REXX. However, I often come across REXX routines that could use some tweaking to perform better, or use less CPU. Often these REXX execs run in batch, working through large files.
So, I thought we'd look at some of the ways to make REXX perform better, and do some testing to see if they're still relevant.
Baseline:
Let's start with a baseline REXX exec. This does nothing but loop around 10,000,000 times.
/* Rexx */
a=time('E')
Do 10000
Do 1000
End
End
Say time('E')
Exit 0
I'm using the time function to measure the elapsed time of the REXX.
This and all REXX execs in this article were executed in batch using IRXJCL:
//STEP1 EXEC PGM=IRXJCL,PARM=REXXB
//SYSTSPRT DD SYSOUT=*
//SYSEXEC DD DISP=SHR,DSN=DZS.REXX
//SYSTSIN DD DUMMY
I've used the SMF Type 30 records to measure the CPU usage.
Performance of this REXX on our test system: CPU=0.36 secs, elapsed= 0.37 secs
Now we have our baseline. So, let's look at some traditional ways of improving performance:
Test 1: Do I vs Do
The idea here is that a Do var = num to num is less efficient than a Do Num statement. So, we've modified our baseline REXX with two such statements:
/* Rexx */
a=time('E')
Do i = 1 to 10000
Do j = 1 to 1000
End
End
Say time('E')
Exit 0
Performance: CPU=2.28, elapsed= 12.5 secs. A bit more expensive than our baseline.
Test 2: Do I To vs Do I For
In this case, we compare Do var = x to y against Do var = x for y (using for rather than to). In the code below, the function is the same as we're not modifying our loop variables. Here's our code:
/* Rexx */
a=time('E')
Do i = 1 for 10000
Do j = 1 for 1000
End
End
Say time('E')
Exit 0
Performance: CPU=1.78, elapsed= 1.81 secs. A little faster than Test 1, but slower than our baseline.
Test 3: One Operation
A basic REXX tactic is to limit operations within loops. Every operation takes CPU and time. To start with, let's see how expensive a simple arithmetic instruction is.
/* Rexx */
a=time('E')
Do 10000
Do 1000
a = a+1
End
End
Say time('E')
Exit 0
Performance: CPU=3.57 secs, elapsed= 3.6 secs. More expensive than any of our plain loops; not a surprise.
Test 4: Comments
It's traditionally been a way of improving performance by putting comments at the end. The idea is that even comments use some CPU and reduce performance. So, we've included four comment lines inside our loop.
/* Rexx */
a=time('E')
Do 10000
Do 1000
/* */
/* */
/* */
/* */
End
End
Say time('E')
Exit 0
Performance: CPU=1.04 secs, elapsed= 1.05 secs. So, comments aren't free: they're about one-third the CPU and elapsed time of a simple arithmetic operation.
As an aside, I've heard that you should avoid comments with asterisks inside them. For example:
/********************************************************/
A quick test shows no difference between these such comments and comments with spaces. Similarly, a test with short comments like
/* */
had similar CPU and elapsed times. The length of the text inside the comment does not make a large difference.
Test 5: Single Line Compare
The idea is that all logical operations on one line are always evaluated, even if they don't need to be. So, if we have b = 1 | b = 2 | b = 3, all three are evaluated, even the first operation matches (because b=1). If we nest them, we can improve performance.
Our first code has them on one line:
/* Rexx */
a=time('E')
b = 1
c = 0
Do 10000
Do 1000
if (b = 1 | b = 2 | b = 3) then c = 1
End
End
Say time('E')
Exit 0
Performance: CPU=6.87secs, elapsed= 6.96 secs.
Our second code nests them:
/* Rexx */
a=time('E')
b = 1
c = 0
Do 10000
Do 1000
If (b = 1) Then c = 1
Else if (b = 2) Then c = 1
Else if (b = 3) Then c = 1
End
End
Say time('E')
Exit 0
Performance: CPU=5.79 secs, elapsed= 5.87 secs. A small reduction.
Let's try a Select statement to do the same thing:
/* Rexx */
a=time('E')
b = 1
c = 0
Do 10000
Do 1000
Select
When b=1 Then c = 1
When b=2 Then c = 1
When b=3 Then c = 1
End
End
End
Say time('E')
Exit 0
Performance: CPU=5.74 secs, elapsed= 5.86 secs. SELECT has same performance as nested If.
Test 6: Inline Subroutine
Often, we use inline subroutines. But do these affect our performance? Our test code is:
/* Rexx */
a=time('E')
Do 10000
Do 1000
Call SUBREXX
End
End
Say time('E')
Exit 0
SUBREXX:
Return 0
Performance: CPU=9.06 secs, elapsed= 9.17 secs. So inline subroutines are about three times the cost of one arithmetic instruction.
Test 7: External Subroutine
The idea is that external subroutines are expensive. So, our SUBREXX subroutine from Test 6 is moved to a separate member in the PDS.
Performance: CPU=960.26, Elapsed=2910.94. Ouch!
I've actually lied here. I did this test with one-tenth of the iterations in the loop, because I knew the overhead would be so great. So, I didn't really run a REXX in a hard loop for almost 50 minutes. The performance was actually 96.03 seconds CPU, 291.1 elapsed for 1000000 iterations. I've multiplied the results by 10 so we can compare with other tests. Not surprisingly, EXCPs went through the roof: every call did an EXCP to the PDS holding the REXX.
So external subroutines are really, really expensive. Some REXX environments such as IBM Netview provide facilities to load REXX execs into memory to reduce this overhead.
Test 8: EXECIO
The idea is that performing one I/O at a time is slower and more expensive that performing many I/Os in one operation, and putting the results in a stem or the data stack.
Our first REXX does what I've seen a lot of programs do: read records one at a time. The record is put into the data stack:
/* Rexx */
a=time('E')
eof = 0
Do While eof=0
EXECIO 1 DISKR INFILE
If RC=2 Then eof = 1
End
Say time('E')
Exit 0
Note that the REGION of our job needed to be large enough to store all our records. Our sample file is RECFM=FB, LRECL=132,BLKSIZE=23440 with 250,000 records.
Performance: CPU=0.98, Elapsed=1.16 seconds.
We now use one EXECIO statement to load all records into a stem variable.
/* Rexx */
a=time('E')
eof = 0
"EXECIO * DISKR INFILE (FINIS STEM inrec."
Say time('E')
Exit 0
Performance: CPU=0.27, Elapsed=0.43 seconds. About one-quarter of the CPU, and one third of the elapsed time.
Conclusion
We've proved that some techniques to make REXX more efficient are still relevant. In particular, avoiding external subroutines is a large step to improving performance. Some of the other ideas will provide small savings, but the savings will only be noticeable if the instructions are inside a large loop.
And this in itself is interesting. Every instruction will cost something. But it's when they are inside loops that things get interesting. Minimizing the number of loop iterations (for example, eliminating records before a loop), and the amount of work (and comments) inside loops will make a large difference to your REXX performance.
David Stephens
|