longpelaexpertise.com.au/ezine/CHeapZones.php?ezinemode=printfriend

LongEx Mainframe Quarterly - August 2021

technical: Using HEAPZONES to Fix C Storage Overlays on z/OS

It's not hard to create a storage overlay in a C program: even easier when that C program runs on z/OS. If you're tracking a storage overlay down, one of the z/OS tools is the Language Environment (LE) HEAPZONES parameter. In this article, we'll see how it works.

Storage Overlays

Suppose we have the following C program:

#include <stdlib.h>
char *str1;
main() {
  str1 = malloc(15);
  strcpy(str1,"This is a test string");
  printf(str1);
  free(str1);
}

C programmers will instantly see our problem: we're copying a 21-byte string into a 15-byte array. When we run this in a batch job, we get a user abend from Language Environment:

+CEE3798I ATTEMPTING TO TAKE A DUMP FOR ABEND U4094 TO DATA SET: DZS.D174
IGD100I 03DE ALLOCATED TO DDNAME SYS00001 DATACLAS (        )
IEA822I COMPLETE TRANSACTION DUMP WRITTEN TO DZS.D174.T0101429.DZSRUNP
+CEE3797I LANGUAGE ENVIRONMENT HAS DYNAMICALLY CREATED A DUMP.
IEA995I SYMPTOM DUMP OUTPUT  031
  USER COMPLETION CODE=4094 REASON CODE=0000002C
 TIME=01.01.42  SEQ=00141  CPU=0000  ASID=0036
 PSW AT TIME OF ERROR  078D1400   85DDF9A6  ILC 2  INTC 0D
   NO ACTIVE MODULE FOUND
   NAME=UNKNOWN
   DATA AT PSW  05DDF9A0 - 00181610  0A0DA7F4  001C1811

Language Environment has detected that some of its control information has been overwritten, and abended with a dump. The SYSOUT DD doesn't tell us much more:

CEE0802C Heap storage control information was damaged.
         The traceback information could not be determined.

This is going to be hard to diagnose. This isn't the only message from a storage overlay. We may get S0C1 or S0C4 abends if the damage isn't detected by Language Environment. If running in CICS, we may see some ASRA/ASRB abends, or user abends like U4038, U4087, U4088, U4092 or U4094 abends. We may see other nasty messages that are victims of our storage overlay.

Introducing HEAPZONES

Introduced in z/OS 2.1, HEAPZONES is a lightweight way of detecting storage overlays from programs writing 'over the end' of variables. They work with heap storage: think malloc and friends. So, let's tweak our program:

#pragma runopts(HEAPZONES(8,MSG,16,MSG))
#include <stdlib.h>
char *str1;
main() {
  printf(str1);
  str1 = malloc(15);
  strcpy(str1,"This is a test string");
  printf(str1);
  free(str1);
}

We've added a #pragma statement to the top. This tells Language Environment to create a buffer zone of 8 bytes (31-bit addresses) and 16 bytes (64-bit addresses) at the end of our variables. Now if we run our program, it ends with a return code of zero! We also see something different written to our SYSOUT:

CEE3716I The heap check zone following the storage at address 1AF39D40 for
        length
         X'00000010' has been overlayed at
         address 1AF39D50. Each byte in the zone from 1AF39D50 to
         1AF39D54 should contain the value X'55'.
CEE3717I Control information in a heap check zone has been damaged.
         The value at address 1AF39D54 should be greater
         than 1AF39D40 and less than or equal to 1AF39D50.
         From entry point main at compile unit offset +000000C8 at entry
         offset +000000C8 at address 1AF00A80.
  1AF39D40: E38889A2 4089A240 8140A385 A2A340A2 |This is a test s    |
  1AF39D50: A3998995 874B004F                   |tring..|            |

So, Language Environment has checked this buffer zone at our free() call, and found that it was overwritten. It's also telling us the text that stomped over our buffer zone. Suppose we have two variables that we use as follows:

#pragma runopts(HEAPZONES(8,TRACE,16,TRACE))
#include <stdlib.h>
char *str1, *str2;
main() {
  printf(str1);
  str1 = malloc(15);
  str2 = malloc(50);
  strcpy(str1,"This is a test string.");
  printf(str1);
  free(str1);
  free(str2);
}

This still works. Our buffer zone is added to the end of each variable. Nice.

HEAPZONES OPTIONS

So, we've achieved two things. Our program now executes with a zero return code, and we've got information that there's been an overlay. Suppose we want our program to abend when this happens, but give us the same information. We could tweak our #pragma statement to look like:

#pragma runopts(HEAPZONES(8,ABEND,16,ABEND))

Now, when we run our program, it abends with a U4042 abend:

+CEE3798I ATTEMPTING TO TAKE A DUMP FOR ABEND U4042 TO DATA SET: DZS.D175.
IGD100I 03DE ALLOCATED TO DDNAME SYS00001 DATACLAS (        )
IEA822I COMPLETE TRANSACTION DUMP WRITTEN TO DZS.D174.T0239105.DZSRUNP
+CEE3797I LANGUAGE ENVIRONMENT HAS DYNAMICALLY CREATED A DUMP.
IEA995I SYMPTOM DUMP OUTPUT  450
  USER COMPLETION CODE=4042 REASON CODE=00000003
 TIME=02.39.10  SEQ=00199  CPU=0000  ASID=0036
 PSW AT TIME OF ERROR  078D1400   85DDF9A6  ILC 2  INTC 0D
   NO ACTIVE MODULE FOUND
   NAME=UNKNOWN
   DATA AT PSW  05DDF9A0 - 00181610  0A0DA7F4  001C1811

Note that this is different to our 'old' abend of a U4094. U4042 with a reason code of 3 is always an abend because of HEAPZONES. We could also simply ignore the abend by coding something like:

#pragma runopts(HEAPZONES(8,QUIET,16,QUIET))

Our job ends with a return code of zero, and there's no Language Environment messages in SYSOUT. But this isn't recommended: we're just hiding the error. A final option is to specify TRACE:

#pragma runopts(HEAPZONES(8,TRACE,16,TRACE))

This is the same as specifying MSG, but we also get some extra diagnostic information: a traceback:

Traceback:
  DSA   Entry       E  Offset  Statement   Load Mod
  1     CEEVHMSG    +00000846              CEEPLPKA
  2     CEEV#FH     +0000023C              CEEPLPKA
  3     main        +000000C8              DZSC
  4     EDCZMINV    +000000C2              CEEEV003
  5     CEEBBEXT    +000001C6              CEEPLPKA

Helpful to find the area in our program where LE found the overlay: our free() statement is at x'C8' bytes after the beginning of our main statement.

Some Gotchas

There are some things to remember when using HEAPZONES. For a start, consider this program:

#pragma runopts(HEAPZONES(8,MSG,16,MSG))
#include <stdlib.h>
char *str1;
main() {
  printf(str1);
  str1 = malloc(15);
  strcpy(str1,"This is a test string");
  printf(str1);
}

We've removed our free() function call. If we run this job again, it will simply end with a zero return code, and no messages. Basically, the same as coding QUIET in the pragma statement. This is because we've instructed Language Environment to get our buffer at the end, but without a free() function call, LE never checks it for overlays.

Have a look at this program:

#pragma runopts(HEAPZONES(8,TRACE,16,TRACE))
#include <stdlib.h>
char str1[15];
main() {
  strcpy(str1,"This is a test string.");
  printf(str1);
}

Rather than using malloc, we've defined the variable as a character array. Now we have a similar situation to before when we omitted the free() function: zero return code as HEAPZONES obtained a buffer, but not messages as we're not freeing the area.

Let's look at another program:

#pragma runopts(HEAPZONES(8,TRACE,16,TRACE))
#include <stdlib.h>
char *str1, *str2;
main() {
  printf(str1);
  str1 = malloc(15);
  strcpy(str1,"This is a test string. A really, really long one!");
  printf(str1);
  free(str1);
}

We get our original U4094 abend, with no nice messages from HEAPZONES.

This is because we're copying a 49-byte string into a 15-byte field. We've specified a buffer zone of 8 bytes in length, but 15+8 (23) bytes is still less than the string we're copying in. If we specified a larger buffer zone so that our string wouldn't run over the edge, we'd be fine.

So, it would be tempting to define a bigger buffer zone (the maximum size is 1024 bytes) to handle this. However, this would greatly increase the storage used. So, we need to balance our storage usage against the size of the strings that may overlay our storage.

There are a few other issues with HEAPZONES:

  • Turning it on will affect performance, and not in a good way. It will also increase storage used, so best to only use it when you need it.
  • IBM recommend that HEAPZONES and HEAPCHK are used separately – not together.
  • Can's use the LE options HEAPZONES and RPTSTG(ON) at the same time.

Finally, you can't specify HEAPZONES at the system level or region level. So, we can't specify it in our PARMLIB CEEPRMxx member, of for an entire CICS region. For our batch job, we could specify the LE options using the CEEOPTS DD:

//STEP1    EXEC PGM=DZSC
//STEPLIB  DD   DISP=SHR,DSN=DZS.LOAD
//SYSPRINT DD   SYSOUT=*
//CEEOPTS  DD   *
  HEAPZONES(8,TRACE,16,TRACE)

We could also create a CEEUOPT module, and bind it with our module. If we can't code pragma statements, this will be the way to go for CICS and IMS programs.

Not a Solution for Everything

HEAPZONES is an excellent tool in our toolbox for diagnosing storage overlays: perfect when our programs are running 'off the edge' of our variables. Or in other words, perfect for C. But HEAPZONES comes at a cost, so it should only be used if necessary, and removed when problems have been diagnosed. Remember also that this isn't the only tool in our toolbox. We talk about some more in our partner article.


David Stephens