technical: Batching CICS Syncpoints for Performance
In our partner article, we talk about units of work: groups of operations where all updates must occur, or none. In many cases a unit of work is a single action by a user: a click on a webpage, a selection on a 3270 screen. A user may want to update a client's address (one unit of work), transfer money to their kid's school (one unit of work), or buy a toy (one unit of work). One transaction, one unit of work.
But we may also have transactions that do more than one unit of work: 'background' transactions. For example, we may have a CICS transaction processing multiple incoming MQ messages: each its own unit of work. For these, it makes sense that we commit every unit of work once it is competed: using a syncpoint command or similar. If the transaction later abends, completed units of work are, well, completed.
But can we improve performance by 'batching' our syncpoints. Or in other words, processing several units of work before issuing each syncpoint command?
But can this even work? What happens if a transaction that batches syncpoints abends? Let's step through an example:
- (Transaction starts)
- Unit of Work1 starts
- Unit of Work 1 ends
- Unit of Work 2 starts
- Unit of Work 2 ends
- Unit of Work 3 starts
- (Transaction abends)
OK, our transaction has abended. Unit of Work 1 has been committed (with a syncpoint), so it stays. Our recovery manager will back out all other updates (Unit of Work 2 and Unit of Work 3). We can then redrive the transaction from Unit of Work 2. How? Any background transaction must have a way of being restarted.
Let's take an example: a transaction triggered by an incoming MQ message, that then gets and processes all messages in a queue (one unit of work per message). Our transaction manager will tell MQ that Unit of Work 2 and 3 are backed out. MQ will put back those MQ messages that Unit of Work 2 and 3 removed. Our transaction will be re-triggered, and start processing the MQ message that started the original Unit of Work 2.
So, batching up syncpoints will not 'lose' units of work. So, let's see if they can help our performance.
Let's take another example: one we actually tested. We created a CICS program that updated a VSAM KSDS dataset. In our first set of tests, the program added 100,000 records. In our second: it read a record for update, and the performed a REWRITE. This read/rewrite was performed 100,000 times; a different record every time.
Initially we performed an EXEC CICS SYNCPOINT command after every update. The question we're now asking is: “if we only syncpoint, say, every 100 updates, do we get any performance benefits? Or does performance get worse?
We did exactly this: we reran our tests six times: syncpointing after every 1, 10, 100, 1000, 10,000 and 100,000 updates. Here's what we saw:
|Operations per Syncpoint||Time (sec)|
|VSAM Add||VSAM Read/Rewrite|
Doesn't look like this gives us much does it? A small decrease in response time for Read/Rewrite, not much change for VSAM adds.
When you think about this, it makes sense. Every VSAM write involves writes to disk (writes to the VSAM data and index component, VVDS, and CICS DFHLOG journal). A syncpoint only involves a write to DFHLOG.
So, the syncpoint overhead isn't large: each individual update is.
This is probably similar if writing to recoverable temporary storage queues or transient data queues: these all involve VSAM writes. But how about an external resource manager?
We wrote a CICS program to put 100,000 messages onto an MQ queue. These messages were small (80 bytes long), persistent, and put in syncpoint. Here's what we found:
|MQPUT per Syncpoint||Time (sec)|
That looks a lot better doesn't it? There is a big reduction in time between performing a syncpoint every MQPUT, and every 10. A smaller reduction between 10 and 100, with reducing benefits after that.
Again, this makes sense when you think about it. MQ (and Db2) are a litter smarter with I/Os: most are done when a unit of work is committed (by syncpoint or transaction end). Reducing syncpoints reduces these I/Os; and time.
Not Exact Figures
The figures we've given should be used only as a guide. In reality, response times and performance will depend on a lot of different things. For example, MQ performance will be different for larger MQ messages, if they are persistent or not, or if they are put in or out of syncpoint. VSAM performance will depends on how many tasks are accessing the file, how big it is, the size of records, and other values.
What the figures do show us is that it's possible to get good performance benefits by batching up syncpoints for external resource managers. However, it's not all good news.
The Flipside: Locks
What these tests don't take into account is the cost of batching syncpoints: locking. Let me explain.
Suppose we have a program that updates VSAM records. We decide to commit these updates every 10 minutes to improve performance. During this 10-minute interval, our program will hold VSAM locks, and other programs will not be able to update these locked records. If we reduce the number of syncpoints too much, it will have a negative impact on performance as more and more transactions wait on locks held.
If we were to look at a graph of operations per syncpoint against performance, we'd expect to see something like:
Ideally, we want to find the number of operations per syncpoint that gives us a 'sweet spot': the lowest response time. In reality, this is going to be difficult to determine. Units of work may perform different operations to different resources. The only effective way is 'trial and error.' And even when we find this sweet spot, it may 'move' over time as processing, data or environments change.
To make it easier to change the number of operations per syncpoint, I've seen sites save values in Db2 tables or other areas. So, they can change the operations per syncpoint to find the best performance without recompiling programs.
More Flipside: Recovery Performance
Batching syncpoints can also affect recovery performance. Suppose we have a transaction that processes for 5 minutes without a syncpoint. If it abends, we will need to backout any changes made: this may take another 5 minutes. We'd then need to re-do the updates.
I've seen one site where a single transaction performed processing for over an hour. When the CICS region abended, it took CICS a long time to perform a 'hot' restart, and backout all those changes. During this time, CICS was unavailable to all other processes.
If only accessing CICS resources, the chances are that there won't be large performance improvements from batching up syncpoints. However, if using an external data manager like IBM MQ or Db2, there's real potential. However, going too far can have hurt performance, including recovery performance. The idea is to find the number of operations per syncpoint that gives the best performance benefits, and regularly review them as workloads and environments change.