Wednesday, September 10, 2014

Techniques to Minimize Memory Usage with IIB

Introduction

An IBM Integration Bus node will use a varying amount of virtual and real memory. Observed sizes for the amount of real memory required for an Integration Server vary from around a few hundred megabytes to many gigabytes. Exactly how much is used is dependent on a number of factors.  Key items are
  • Integration flow coding – includes complexity and coding style
  • Messages being processed – size, type and structure of the messages
  • DFDL schema /Message sets deployed to the Integration Server
  • Level of use of Java within the Integration flows and setting of the JVM heap
  • Number of Integration flows that are deployed to the Integration Server
  • Number of Integration Servers configured for the Integration Node.
This article contains some practical recommendations to follow in order to reduce memory usage, often with dramatic results.  They are split in to two sections: Integration Flow Development Recommendations and Configuration Recommendations.  The article is written with IBM Integration as the focus. However all of the techniques and  comments equally apply to WebSphere Message Broker.
 

Integration Flow Development Recommendations

 

Introduction

At the core of all processing within an Integration flow is the message tree. A message tree is a structure that is created, either by one or more parsers when an input message bit stream is received by an Integration flow, or by the action of an Integration flow node.  A new message tree is created for every Integration flow invocation. It is cleared down at Integration flow termination.
The tree representation of a message is typically bigger than the input message, which is received as a bitstream into the Integration flow.  The raw input message is of very limited value when it is in a bitstream format and so parsing into an in memory structure is an essential step in order to make subsequent processing easier to specify whether that be using a programming language, such as ESQL, Java or .Net , or a mapping tool like the Graphical Data Mapper.
The shape and contents of a message tree will change over the course of execution of an Integration flow as the logic within the Integration flows executes. 
The size of a message tree can vary hugely and it is directly proportional to the size of the messages being processed and the logic that is coded within the Integration flow.  So both factors need to be considered. That is how messages are processed, including parsing, and how they are processed within the Integration flow – the Integration flow logic.

 

Message Processing Considerations

When a message is small, such as a few kilobytes in size then the overhead of fully parsing that message is not that great. However when a message is 10’s KB or larger then the cost of fully parsing it becomes larger.  When the size grows to megabytes then it becomes even more important to keep memory usage to a minimum.  There are a number of techniques that can be used to keep memory usage to a minimum and these are described below.
Parsing of a message will always commence from the start of the message and proceed as far along the message as required in order to access the element that is being referred to in the flow though the processing logic (ESQL, Java, XPath or Graphical Data Mapper map etc.). Dependent on the field being accessed then it may not be necessary to parse the whole message.  Only that portion of the message that has been parsed will be populated in the message tree. The rest will be held as an unprocessed bitstream. It may by parsed subsequently in the flow if there is logic that requires it. Or it may never be parsed if it is not required as part of the Integration flow processing.
If you know the Integration flow needs to process the whole of the message in an Integration flow then is most efficient to parse the whole message on first reference to a field within the message. To ensure a full parse specify Parse Immediate or Parse Complete on the input node. Note the default option is Parse on Demand.
Always use a compact parser where possible. XMLNSC for XML and DFDL for non-XML data are both compact parsers. The XML and XMLNS parsers are not compact parsers. The MRM is also a compact parser but this has now been superseded by DFDL. The benefit of compact parsers is that they discard white space and comments in a message and so those portions of the input message are not populated in the message tree, so keeping memory usage down.
For XML messages you can also consider using the opaque parsing technique. This technique allows named subtrees of the incoming message to be minimally processed. What this means is that they will be checked for XML completeness but the sub tree will not be fully expanded and populated into the message tree. Only the bit stream for the subtree will appear in the message tree. This technique reduces memory usage.  However when using this technique you should not refer to any of the contents of the subtree that has been opaquely parsed.
If you are designing messages to be used with applications that route or process only a portion of the message then place the data that those applications require at the front of the message. This means less of the message needs to be parsed and populated into the message tree.
When the size of the messages is large, that is 100’s K upwards then use large message processing techniques where possible.  There are a couple of techniques that can be used to minimize memory usage but the success of them will depend on the message structure.
Messages which have repeating structures and where the whole message does not need to be processed together lend themselves very well to the use of these techniques. Messages which are megabytes or gigabytes in size and which need to be treated as a single XML structure for example are problematic as the whole structure needs to be populated in the message tree and there is typically much less capacity to optimize processing.
The techniques are
  1. Use the DELETE statement in ESQL to delete already processed portions of the incoming message
  2. Use Parsed Record Sequence for record detection with stream based processing such as with files (FileInput, FTEInput, CDInput) and TCPIP processing (TCPIPClientInput, TCPIPServerInput) where the records are not fixed length and there is no simple record delimiter.
A summary of these techniques is provided here to give you an idea of what they consist of but for the full details you should consult the links to the IBM Integration Bus Knowledge Centre that are given below.
 
DELETE Statement
Use of this technique requires that the message contains a repeating structure where each repetition or record can be processed individually or as a small group.  This allows the broker to perform limited parsing of the incoming message at any point in time. In time the whole message will be processed but not all at once and that is the key to the success of the technique.
The technique is based on the use of partial parsing and the ability to parse specified parts of the message tree from the corresponding part of the bit stream.
The key steps in the processing are:
  • A modifiable copy of the input message is made but not parsed (Note InputRoot is not modifiable). As the copy of the input message is not parsed it takes less memory then it would if it were parsed and populated into the message tree.
  • A loop and reference variable are then used to process the message one record at a time. 
  • For each record the contents are processed and a corresponding output tree is produce in a second special folder.
  • The ASBISTREAM function is used to generate a bit stream for the output subtree. This is held in a Bitstream element in a position that corresponds with to its position in the final message.
  • The DELETE statement is used to delete both the current input and output record message trees when the processing for them has been completed.
  • When all of the records in the message have been processed the special holders used to process the input and output streams are detached so that they do not appear in the final message. 
If needed then information can be retained from one record to another through the use of variables to save state or values.
 
Parsed Record Sequence
This technique uses the parser to split incoming data from non message based input sources such as the File and TCPIP nodes into messages or records that can be processed individually.  These messages are smaller in size than the whole file or record. This again allows overall memory usage to be reduced. Substantially in some cases. It allows very large files that are Gigabytes in size, to be processed without requiring Gigabytes of memory.
The technique requires the parser to be one of the XMLNSC, DFDL or MRM(CWF or TDS) parsers. It will not work with the other parsers.
In order to use this technique the Record Detection property on the input node needs to be set to Parsed Record Sequence.
With this technique the Input node will use the parser to determine the end of a logical record.  In this situation it typically cannot be not be determined simply by length or a simple delimiter like .  When a logical record has been detected by the parser then it will be propagated through the Integration flow for processing in the usual way.
 

Coding Recommendations

This section contains some specific ESQL and node usage coding recommendations that will help to reduce memory usage during processing.
Message Tree Copying
·      Minimize the number of times that a tree copy is done.  This is the same consideration for any of the transformation nodes. 
In ESQL this is usually coded as
SET OutputRoot = InputRoot; 
In Java it would be
   MbMessage inMessage = inAssembly.getMessage();
   MbMessage outMessage = new MbMessage();  // create an empty output message
MbMessageAssembly outAssembly = new MbMessageAssembly(inAssembly, outMessage);
 
In an Integration flow combine adjacent ESQL Compute nodes. For example:
image
 
 
 
 
 
 
 
 
 
 
 
 
 
In this example the two ESQL Compute nodes cannot be further combined into a single ESQL Compute node as there is an external call to SAP with the SAPRequest node.
Watch for this same issue of multiple adjacent compute nodes when using subflows. An inefficient Subflow can cause many problems with a flow and across flows. If an efficient Subflow is used repeatedly in an application this can replicate the inefficiency many times within the same Integration flow or group of Integration flows..
It is not so easy to combine an adjacent ESQL Compute and a Java Compute node. Often the Java Compute node will contain code that cannot be implemented directly in ESQL. It may be possible to do it the other way and implement the ESQL code as Java in the Java Compute node. An alternative approach would be to invoke the Java methods through a Procedure in ESQL if you are looking to combine into a Compute node.
Consider using the Environment correlation name to hold data rather than using InputRoot and OutputRoot or LocalEnvironment in each Compute/Java Compute node.
This way a single copy of the data can be held. You should be aware that there are recovery issues associated with this however. When using the traditional approach of SET OutputRoot=InputRoot; to copy message data in the course of an Integration flow the message tree is copied at each point at which this statement is coded. Should there be a failure condition and processing is rolled back within the Integration flow so will be the message tree. When data is held in Environment there is a single copy of the data which will not be backed out in the event of a failure condition. The application code in the Integration flow needs to be aware of this difference in behaviour and allow for it.
 
ResetContentDescriptor Node
·       Where there is a sequence of Compute node -> ResetContentDescriptor node -> Compute node this could be replaced by one compute node that made use of the ESQL ASBITSTREAM and CREATE with PARSE to replace the RCD node.
·       A common node usage pattern is a combination of Filter nodes and ResetContentDescriptor nodes to determine which owning parser to assign. This can be optimised using a single Compute node with IF statements and CREATE with PARSE statements.
 
Trace nodes
·       Be aware of the use of trace nodes in non development environments. References to ${Root} will cause a full parse of the message if the trace node is active. Hopefully any such processing is not on the critical path of the Integration flow and it would only be executed for exception processing.
 
Node Level Loops
·       Although loops for processing at the node level are permitted within the IBM Integration Bus programming model do be careful about which situations they are used in or there can be a rapid growth of memory usage.
For example consider a schematic of an Integration flow which is intended to read all of the files in a directory and process them. 
 
image
 
 
 
 
 
 
 
 
 
As each file is read and processed and processing completes then the End of File Terminal is driven and the next file is read to be processed.
This is indeed what is needed but the implementation is such that all of the state associated with the node (message tree, parsers, variables etc.) is placed on the stack and heap as the node is repeatedly executed. In one particular case where several hundred files were being read in the same Integration flow execution the Integration Server rose in size to be around 20 GB of memory.
Using this alternate design the required memory size was substantially less with a size of 2 GB compared with the previous 20 GB.
image
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
This technique uses the Environment to pass state from the File Read node, that is whether the file has been completely read, back to the Compute node Setting_FileRead_Properties, so that another PROPAGATE to the File Read node can take place for the next file.
 
Optimise ESQL Code
·       Write efficient ESQL which reduces the number of statements required. This will increase the efficiency of the Integration flow by reducing the number of ESQL objects which are parsed and created in the DataFlowEngine. A simple example of this is to initialize variables on a DECLARE instead of following the DECLARE with a SET. Use the ROW and LIST constructors to create lists of fields instead of one at a time. Use the SELECT function to perform message transformations instead of mapping individual fields.
·       When using the SQL SELECT statement, use WHERE clauses efficiently to help minimize the amount of data retrieved from the database operation. It is better to process a small result set than have to filter and reduce it within in the Integration flow
·      Use the PROPAGATE verb where possible. For example if multiple output messages are being written from a single input message, then consider using the ESQL PROPAGATE function to source the loop. With each iteration of the propagation, the storage from the output tree is reclaimed and re-used, thus reducing the storage usage of the Integration flow.
 

Configuration Recommendations

 

Introduction

Whilst the major influence on the amount of memory used in processing data is the combination of the messages being processed and the way in which the flow is coded the configuration of the broker runtime can also have a significant effect and it should not be ignored.  This section will discuss the key considerations.
The Integration flow logic and indeed product code with an Integration Server all use virtual memory.  The message tree, variables, input and output messages are all held at locations within virtual memory as they are processed.  Virtual memory can be held in central, or real, memory. It can also be held in auxiliary memory or swap space if it is paged or swapped out. It can not be used in processing at this time though. Where a page of virtual memory sits exactly is determined by the operating system and not by IBM Integration Bus.
Note that Integration Servers which are configured differently and which have different Integration flows deployed to them will use different amounts of virtual and real memory. There is not one amount of virtual and real memory for every Integration Server.
For processing to take place a subset of the virtual memory allocated to an Integration Server will need to be in real memory so that data can be processed (read and/or updated) with ESQL, Java or Graphical Data Mapping nodes etc.  This set of essential memory is what forms the real memory requirement, or working set, of the Integration Server. The contents of the working set will change over time.
For processing to continue smoothly all of the working set needs to be in memory. If required pages need to be moved into real memory then this causes delays in processing.  In a heavily loaded system the operating system has to manage the movement of pages very quickly. When the demand for real memory is much higher than that which is actually available then processing for one one more components can be slowed down and in extreme cases systems can end up thrashing and performing no useful work. It is important therefore to ensure that there is sufficient real memory available to allow processing to run smoothly.  The first part of this is to make sure that all processing is optimized which is what the Integration flow coding recommendations are intended to help with.
The real memory requirements of all of the Integration Servers that are running plus that required by the Bipservice Bipbroker and BipHTTPListener processes form the total real memory requirement of the Integration Node.  As Integration flows and/or Integration Servers are started and stopped this real memory requirement will change. For a given configuration and fixed workload then the overall real memory requirement should remain within a narrow band of usage with only slight fluctuations. If the real memory requirement continues to grow there is most likely a memory leak.  This could be in the Integration logic or in some rare cases the IBM Integration Bus product.
Now that IBM Integration uses 64 bit addressing there are no virtual memory constraints for the deployment of Integration flows to Integration Servers in the way that there were when 32-bit addressing was the only option.   This means that Integration flows can be freely deployed to Integration Servers as required. 
The memory requirement for an integration node will be entirely dependent on the processing that needs to be performed. If all Integration Servers were stopped the real memory requirements would be very small. If 50 Integration Servers were active and processing using inefficient Integration flows to process large messages or  large files then the real memory requirement could be 10’s of Gigabytes in size for example.  Again this is no fixed memory requirement for an Integration node. It will very much depend on which integration flows are running and the configuration of the Integration node.
 

Broker Configuration

The following configuration factors affect the memory usage of an integration server.
  • The deployment of Integration flows to an Integration Server
  • The size of the JVM for each Integration Server
  • The number of Integration Servers
We will now look at each of these in more detail.
 
Deployment of Messages Flows to Integration Servers
Each Integration flow has a virtual memory requirement that is dependent on the routing/transformation logic that is executed and the messages to be processed. 
Using additional instances for a message will result in additional memory usage but not by so much as deploying another the same flow to another Integration Server.
The more, different, Integration flows that are deployed to an Integration Server then the more virtual and real memory will be used by that Integration Server.
There will be an initial requirement of memory for deployment and a subsequent higher requirement once messages have been processed. Different Integration flows will use different amounts of additional memory.  It is not possible to accurately predict the amount of virtual and real memory that will be needed for any Integration flow or message to be processed. So for planning purposes it is best to run and measure the requirement once the Integration flow has processed messages. After some minutes of processing messages memory usage should stablise providing the mix of messages is not constantly changing, such as the size continually increasing.
If multiple messages have been deployed to the Integration Server then ensure all process messages before observing the peak memory usage of the Integration Server.
When looking at the memory usage of an Integration Server then focus on the real memory usage. That is the RSS value on Unix systems or Working Set in Windows. This is the key to understanding how much memory is being used by a process at a point in time. There will be other pages that may be sitting in swap space that were used at some point in the processing possibly at start-up or when a different part of the message flow was executed for example. But due to the demand for memory they may well no longer be in real memory.
To understand how much memory processes on the system are using then use the following:
  • AIX - The command  ps -e -o vsz=,rss=,comm= will display virtual and real memory usage along with the command for that process
  • Linux: - The command ps -e -o vsz,rss,cmd will display virtual and real memory usage along with the command for that process
  • Windows - Run Task Manager then select View -> Select Columns -> Memory (Private Working Set)
 
Integration Server JVM Settings
In IBM Integration Bus V9 the default setting of the JVM is a minimum setting of 32 MB and a maximum of 256 MB. For most situations these settings are sufficient.  The same settings are true with WebSphere Message Broker V8.
The amount of Java heap required is dependent on the Integration flows and in particular the amount of Java. This includes nodes which are written in Java such as the FileInput, FileOutput, SAPInput, SAPRequest nodes etc.  Given this then different Integration Servers may well require different Java heap settings so do not expect to always use the same values for every Integration Server.
A larger Java heap maybe needed if there is a heavy requirement from the Integration flow logic or nodes which are used within it. The best way to determine if there is sufficient Java heap is to look at Resource Statistics for the Integration Server to observe the level of Garbage Collection.
For batch processing low GC overhead is the goal. Low would be of the order of 1%.
For real–time time processings then low pause times are the goal. Low in this context is less than 1 second.    
As a guide a GC overhead of 5-10% can be tuned. As can pause time of 1-2 seconds.
The Integration Server JVM heap settings can be changed with the mqsichangeproperties command. For example the command:
mqsichangeproperties PERFBRKR -o ComIbmJVMManager -e IN_OUT -n jvmMaxHeapSize -v 536870912
will increase the maximum JVM heap to a value of 536870912 bytes (512 MB) for the Integration Server IN_OUT in the Integration node  PERFBRKR.
 
Numbers of Integration Servers
A variable number of Integration Servers are supported for an Integration node. No maximum value is specified by the product.  Practical limits, particularly the amount of real memory or swap space will limit the number than can be used on a particular machine.
Typically large systems might have 10’s of Integration Servers. A broker with 100 Integration Servers would be very large and at point we would certainly recommend reviewing the policy used to create Integration Servers and assign Integration flows to them.
The number of Integration Servers in use does not impact the amount of virtual memory or real memory used by an individual Integration Server but it does have a very direct effect on the amount of real memory that is required by the Integration node as a whole and so for this reason it is good to think about how many Integration Servers are really required.
An Integration Server is able to host many Integration flows. One Integration Server could host hundreds of flows if needed. Think carefully before doing this though as if the Integration Server fails a high number of applications will be lost for a time and the restart time for the Integration Server will be significantly increased over one with a few Integration flows deployed. If the Integration Server contains Integration flows that are seldom used then this is much less of an issue.
It is a good idea to have a policy controlling the assignment of Integration flows to Integration Servers. There are a number of schemes commonly in use. Some examples are
  • Only have Integration flows for one business application assigned to an individual Integration Server. [Large business applications may require multiple Integration Servers dependent on the number of messages flows for the business area].
  • Assigning flows that have the same hours of availability together. It is no good assigning flows for the on-line day and for the batch at night into the same Integration Server as it will not be possible to shutdown the Integration Server without interrupting the service of one set of flows.
  • Assign flows for high profile of high priority applications/services to their own Integration Servers and have general utility Integration Servers which host many less important flows.
  • Assign Integration flows across Executions so that volume
In a multi-node deployment avoid the tendency to deploy all integration flows to all nodes. Have some nodes provide processing for certain applications only.  Otherwise with everything deployed everywhere real memory usage will be large.
One additional point. Assign Integration flows that are known to be heavy on memory usage to the minimum number of Integration Servers. This will reduce the number of Integration Servers that become large in size and require large amounts of real memory.
Whichever policy you arrive at think ahead into the future and ensure it will still work if there are hundreds of more services added. Sometimes people assign one flow or service to one Integration Server when there are only a few services in existence. This is something that you can live with in the early days. But by the time there are another 400 services it simply does not scale and the demand for real memory usage becomes an issue.