Wednesday, July 16, 2014

Websphere Application Server - Certificate Expiration Monitor and Dynamic Run Time Updates

As should not be surprising, certificates expire. Should a certificate expire, SSL communication using that certificate will be impossible, which will almost certainly result in a system outage. WebSphere Application Server tries hard to prevent these outages, and when it cannot prevent them, it tries to at least warn you before they occur. The certificate expiration monitor task runs on a configurable schedule which, by default, is every 14 days. The Next start date field for the monitor is persistent in the configuration and is updated with a new date each time it runs. It will execute in the deployment manager process in a Network Deployment environment, or -- if standalone -- in the WebSphere Application Server base process.
When executing, the expiration monitor will search through all KeyStore objects configured in the cell, looking for any personal certificates that will expire within the expiration threshold (90 days being the default; this is configurable via a custom property). If it finds any, it will issue a notification warning of the impending expiration. Notifications are always sent to the serious event stream to all registered listeners. By default, this is the admin console and SystemOut.log. Notifications can also be sent via email using an SMTP server.
In addition to notifications, WebSphere Application Server will attempt to replace self-signed certificates before they expire. By default, the expiration monitor will execute the certificate replacement task (mentioned in the previous section) against any self-signed certificates 15 days before expiration (this is configurable). The task creates a new certificate using the certificate information from the old one, and updates every trust store in the cell that contained the old signer with the new signer certificate. By default, the old signer certificate will be deleted.
The expiration monitor marks any SSL configuration as "modified" whenever the monitor changes the key store or trust store referenced by the configuration. The configuration changes are saved once the expiration update task is completed, causing a ripple to occur throughout the runtime. The first thing that happens is the temporary disabling of SSL server authentication (for 90 seconds) to enable these changes to occur without requiring a server restart. In cases where you do not want this to occur, consider disabling the Dynamically update the run time when SSL configuration changes occur option located at the bottom of the SSL certificate and key management panel in the admin console.
Unfortunately, automatically replacing certificates is not a panacea. WebSphere Application Server cannot update certificates in key stores that are not under its control. In particular, this means that a Web server plug-in that is using the previous soon-to-expire signing certificate will stop working when the corresponding personal certificate is replaced. It also means that if WebSphere Application Server was using the personal certificate to authenticate with some other system, the certificate replacement will cause an outage. Keep in mind that this outage would have occurred anyway -- it is just occurring 15 days sooner, and after WebSphere Application Server has sent multiple warnings of this impending outage. WebSphere Application Server is simply doing its best.

It should be obvious that letting WebSphere Application Server automatically change the expiring certificates in a production environment is risky, since it could potentially cause a short- or long-term outage. Instead, you should change certificates manually when you are notified of their impending expiration. Automatic replacement is primarily intended to simplify management for less complex environments, and for development systems where brief outages are acceptable. For most production environments, we recommend that you instead monitor and act on the expiration notification messages and disable automatic replacement of self-signed certificates. Figure 14 shows the configuration panel for the certificate expiration monitor.
Certificate expiration monitor configuration panel
Figure 14. Certificate expiration monitor configuration panel

Sunday, July 13, 2014

Memory Management In WMB/IIB9

When considering memory usage within a DataFlowEngine process there are two sources that the storage is allocated from, and these are :
1. The DataFlowEngine main memory heap
2. The operating system 

When message flow processing requires some storage, then an attempt is first made to allocate the memory block required from the DataFlowEngine's heap. If there is not a large enough contiguous block of storage on the heap, then a request will be made to the operating system to allocate more storage to the DataFlowEngine for the message flow. Once this is done, then this would lead to the DataFlowEngine's heap growing with the additional storage, and the message flow will use this extra storage.
When the message flow has completed its processing, then it issues a "free" on all its storage and these blocks will be returned to the DataFlowEngine's heap ready for allocation to any other message flows of this DataFlowEngine. The storage is never released back to the operating system, because there is actually no programmatic mechanism to perform such an operation. The operating system will not retrieve storage from a process until the process is terminated. Therefore the user will never see the size of the DataFlowEngine process decrease, after it has increased.
When the next message flow runs, then it will make requests for storage, and these will then be allocated from the DataFlowEngine heap as before. Therefore there will be a re-use within the DataFlowEngine's internal storage where possible, minimizing the number of times that the operating system needs to allocate additional storage to the DataFlowEngine process. This would mean there may be some growth observed on DataFlowEngine's memory usage which is of the size of the subsequent allocations for message flow processing. Eventually we would expect the storage usage to plateau, and this situation would occur when the DataFlowEngine has a large enough heap such that any storage request can be satisfied without having to request more from the operating system. 

Memory fragmentation in a DataFlowEngine process

At the end of each message flow iteration, storage is freed back to the DataFlowEngine memory heap ready for re-use by other threads. However, there are objects that are created within the DataFlowEngine that last the life of the DataFlowEngine and therefore reside at that point in the heap for that time. This leads to what is known as fragmentation and as a result reduces the size of contiguous storage blocks available in the DataFlowEngine when an allocation request is made. This means that DataFlowEngine process has the memory blocks for allocation but are fragmented to be allocated to requests made during message processing. In most of the cases, the requesters of storage require a contiguous chain of blocks in memory. Therefore, it is possible for a message flow to make a request for storage against the DataFlowEngine's heap that does not have enough free storage to satisfy the request for this contiguous chain of blocks, but the storage is fragmented, such that the contiguous block does not fit into any of the "gaps". In this situation, a request would have to be made to the operating system to allocate more storage to the DataFlowEngine so that this block can be allocated.
However, when unfreed blocks remain on the DataFlowEngine's heap then this will fragment the heap. This means that there will be smaller contiguous blocks available on the DataFlowEngine's heap. If the next storage allocation cannot fit into the fragmented space, then this will cause the DataFlowEngine's memory heap to grow to accommodate the new request.
This is why small increments may be seen in the DataFlowEngine even after it has processed thousands of messages. In a multi-threaded environment there will be potentially many threads requesting storage at the same time, meaning that it is more difficult for a large block of storage to be allocated.
For example,
Some message flows implement BLOB domain processing which may result in the concatenating of BLOBs. Depending on how the message flow has been written, this may lead to fragmentation of the message heap due to the fact that when a binary operation takes place such as concatenation, both the source and target variables need to be in scope at the same time.
Consider a message flow that reads in a 1MB BLOB and assigns this to the BLOB domain. For the purposes of demonstration, this ESQL will show a WHILE loop that causes the repeated concatenation of this 1MB BLOB to produce a 57MB output message. Consider the following ESQL :
SET c = CAST(InputRoot.BLOB.BLOB AS CHAR CCSID InputProperties.CodedCharSetId);
SET d = c;

WHILE (i <= 56) DO
  SET c = c || d;
  SET i = i + 1;

SET OutputRoot.BLOB.BLOB = CAST(c AS BLOB CCSID InputProperties.CodedCharSetId);
As can be seen, the 1MB input message is assigned to a variable c and then this is also copied to d. The loop then concatenates c to d and assigns the result back to c on iteration. This means that c will grow by 1MB on every iteration. Since this processing generates a 57MB blob, one may expect the message flow to use around 130MB of storage. The main aspects of this are the ~60MB of variables in the compute node, and then 57MB in the Output BLOB parser which will be serialised on the MQOutput node.
However this is not the case. This ESQL will actually cause a significant growth in the DFE's storage usage due to the nature of the processing. This ESQL encourages fragmentation in the memory heap. This condition means that the memory heap has enough free space on the current heap, but has no contiguous blocks that are large enough to satisfy the current request. When dealing with BLOB or CHAR Scalar variables in ESQL, these values need to be held in contiguous buffers in memory.
Therefore when the ESQL SET c = c || d; is executed, in memory terms this is not just a case of appending the value of d, to the current memory location of c. The concatenation operator takes two operands and then assigns the result to another variable, and in this case this just happens to be one of the input parameters. So logically the concatenation operator could be written SET c = concatenate(c,d). This is not valid syntax but is being used to illustrate that this operator is like any other binary operand function. The value contained in c cannot be deleted until the operation is complete since c is used on input. Furthermore, the result of the operation needs to be contained in temporary storage before it can be assigned to c.


Out of Memory Issue

When a DataFlowEngine reaches the JVM heap limitations, it typically generates a javacore, heapdump along with a java out of memory exception in the EG stderr/stdout files.
When the DataFlowEngine runs out of total memory, it may cause the DataFlowEngine to go down or the system to become un-responsive. 


Purpose : For any given message flow, a typical node requires about 2KB of the thread stack space. Therefore, by default, there is a limit of approximately 500 nodes within a single message flow on the UNIX platform and 1000 nodes on the Windows platform. This limit might be higher or lower, depending on the type of processing being performed within each node. If a message flow of a larger magnitude is required, one can increase this limit by setting the MQSI_THREAD_STACK_SIZE environment variable to an appropriate value( broker must be restarted for the variable to be effective). This environment variable setting applies to brokers, therefore the MQSI_THREAD_STACK_SIZE is used for every thread that is created within a DataFlowEngine process. If the execution group has many message flows assigned to it, and a large MQSI_THREAD_STACK_SIZE is set, this can lead to the DataFlowEngine process requiring a large amount of storage for the stack. In WMB, it is not just execution of nodes that can cause a build up on a finite stack size. It follows from the same principles for any processing that leads to a large amount of nested or recursive processing and might cause extensive usage of the stack. Therefore, you may need to increase the MQSI_THREAD_STACK_SIZE environment variable in the following situations: a) When processing a large message that has a large number of repetitions or nesting. b) When executing ESQL that recursively calls the same procedure or function. This can also apply to operators. For example, if the concatenation operator was used a large number of times in one ESQL statement, this could lead to a large stack build up.
However, it should be noted that this environment variable applies to all the message flow threads in all the execution groups, as it is set at the broker level. For example, if there are 30 message flows and this environment variable is set to 2MB then that would mean that 60MB would be reserved for just stack processing and thus taken away from the DataFlowEngine memory heap. This could have an adverse effect on the execution group rather than yielding any benefits. Typically, the default of 1 MB is sufficient for most of the scenarios. Therefore we would advise that this environment variable NOT be set unless absolutely necessary.


System kernel parameters for WMB/IIB

 In WMB/IIB there are no suggested kernel settings for the tuning of an operating system. However, the WebSphere MQ and some database products do, and WMB/IIB runs under the same environment as these. Hence, its best to check and tune your environment as guided by these applications.

Monitor memory usage on Windows and UNIX

At any given point, you can check the memory usage for processes in the following way:
Ctrl-Alt-Delete > Start Task Manager > Processes > Show processes for all users and go to the process "DataFlowEngine" and look at the field "Memory (Private working set)
If you want to continuously monitor the memory usage, then check the following link for Windows sysinternals for process utilities:
ps -aelf | grep
If you want to continuously monitor the memory usage, then the above command may have to be incorporated into a simple shell script.





The size of the XPath cache is fixed at 100(default) elements. If you use many XPath expressions 
this fixed size can become a performance bottleneck with a single flow invocation completely 
invalidating the cache.
WMB8 and IIB9  allows you to configure the size of the XPath cache so you can control how many compiled 
XPath expressions are stored at any one time. This also allows the cache to be disabled so that 
no compiled XPath expressions are cached. Disabling the cache may improve throughput in a highly
multi-threaded environment as it removes thread contention on the cache.

The new property is called compiledXPathCacheSizeEntries and is set on a per execution group 
basis and is configured at the execution group level. The property can be set using the following
mqsichangeproperties  -e  -o ExecutionGroup -n compiledXPathCacheSizeEntries -v 
where  is the size of the cache to be set.The size can be set to any value greater than or equal to 100
and a value of 0 means disable the cache. The default value is 100.

The configured value can be reported using the following mqsireportproperties command: 
mqsireportproperties  -e  -o ExecutionGroup -n compiledXPathCacheSizeEntries 
and can also be reported as part of the other ExecutionGroup level properties:

  mqsireportproperties  -e  -o ExecutionGroup -a