Introduction

The event manager is the central component in IBM Business Process Manager that is responsible for scheduling a number of different tasks. If it's not working correctly, you might run into severe problems that need to be resolved quickly.

This blog entry describes some of the most common symptoms and shows how to resolve them.

In Part I, some common event manager problems will be shown. Part II explains how to analyze and fix these problems. Part III lists the available APARs that are related to the event manager.

Special thx to Mark Filley and Bill Wentworth for their technical input and in depth review!

Part I - Common event manager symptoms

Symptom A - Event manager is not processing any work

The event manger is responsible for the scheduling of various jobs like:

Executing undercover agents (UCAs)
Executing system lane tasks
Triggering business process definition (BPD) timers
Scheduling BPD notifications, which are essential to move the process flow forward through the business process diagram (BPMN)

If your process instances are stuck, timers not fired, and UCAs are no longer executed, then the event manager is not running or could be blocked.

The Process Admin Console gives you a comprehensive view that shows the status of the event manager. When the status light is red, the event manager is paused or did not start and its jobs are accumulating with a Scheduled Time much behind the current time. In the Process Admin Console, these jobs would show 'Job Status' as 'Scheduled' and jobs would not be in the 'Executing' state.

For example:

The following screen shot, which was taken from the Process Admin Console, shows the last event manager heart beat expiration time stamp of "12/10/2014 2:00:15 PM." This time stamp is normally ahead of the current time. The event manager job's (UCAs and BPD notifications) Scheduled Time shows an earlier time stamp and a job is not currently executing. In this example, the event manager is shown as inactive (red light), which explains the situation.

Note: Even if the event manager is not running, it is possible to start new process instances, but they will not move forward! As services are not scheduled by the event manager, those could also be executed.

Symptom B - Event manager shows jobs with a scheduled date of 2099

The Process Admin Console can show event manager jobs scheduled for 2099 as shown here:

Symptom C - Event manager is active, but long running system lane tasks block the event manager throughput

There can be situations where the event manager is actively working, but you experience throughput problems. For example, the flow in the process instances is not moving forward or the execution of timers is delayed.

The following screen shot shows five system lane activities being executed, but a couple of BPD notifications are waiting to be executed. These BPD notifications are overdue as the 'Scheduled Time' is greater than the current time. This situation can indicate that the event manager configuration needs to be tuned and/or the execution time for system lane tasks needs to be optimized, if possible.

See resolution section C,D,E,F - Event manager is active, but throughput problems exist to analyze and resolve this problem.

Symptom D - UCAs are not processing at the desired rate

According to the definition in the process application, UCAs are bound to a couple of synchronous queues or a single asynchronous queue managed by the event manager. The capacity for these queues is defined by the following parameters in the 80EventManager.xml configuration file:
nc-queue-capacity> or

c-queue-capacity>

These numbers limit the rate of UCAs that can be executed at a time.

See resolution section C,D,E,F - Event manager is active, but throughput problems exist to analyze and resolve that problem.

Symptom E - Many BPD timers wake up at the same time

When the event manager processes a timer, it loads the applicable task into the "BPD async queue," whose capacity is defined by the -queue-capacity> setting from the 80EventManager.xml configuration file. If the application design has hundreds or thousands of timers that start at the exact same time, then this setting might need to be increased beyond the default of forty (40).

Keep in mind that this queue is shared between timer executions, BPD notifications and the execution of system lane tasks.

See resolution section C,D,E,F - Event manager is active, but throughput problems exist to analyze and resolve that problem.

Symptom F - Event Manager warning messages CWLLG2156W, CWLLG2236W occur

If the BPM run time detects that the database connection pool is too small, it will dynamically reduce the queue sizes and you will see entries in SystemOut.log like the following messages:
"CWLLG2156W: The database connection pool size xxx of the Process Server data source might be too small." and/or
"CWLLG2236W: The configured <%%%%%%-queue-capacity> parameter of xxx has been changed to yyy."

These messages indicate that there is a mismatch between the event manager queue capacity and the JDBC data source pool size.

See resolution section C,D,E,F - Event manager is active, but throughput problems exist to analyze and resolve that problem.

Symptom G - Event manager tasks fail when LombardiEventEmitterInputQueue reached max threshold

When you have your IBM Business Process Manager environment configured to forward monitoring events to a Business Monitor server, the execution of event manager tasks involves sending a message to the local queue called "LombardiEventEmitterInputQueue." This queue maps to the JNDI name jms/com.ibm.lombardi/EventEmissionQueue.

If the queue depth of the LombardiEventEmitterInputQueue reaches the configured maximum threshold, no more message can be put to this queue and the execution of an event manager tasks will end up in an exception like the following text:

J2CA0027E: An exception occurred while invoking prepare on an XA Resource Adapter from DataSource jms/com.ibm.lombardi/EventEmissionQueueFactory, within transaction ID {XidImpl: formatId(57415344), gtrid_length(36), bqual_length(54),
data(0000014ac680b3dd000000010c3c5a4c30653f6b06f16c1e5782cea7f4fce4b60a8f48d30000014ac680b3dd000000010c3c5a4c30653f6b06f16c1e5782cea7f4fce4b60a8f48d3000000010000000000000000000000000002)} : javax.transaction.xa.XAException: CWSIC8007E: An exception was caught from the remote server with Probe Id 3-013-0010. Exception: CWSIC2029E: This transaction cannot commit as an operation that was performed within the transaction boundary failed. The first operation that failed generated the following exception: com.ibm.ws.sib.processor.exceptions.SIMPLimitExceededException: CWSIK0025E: The destination LombardiEventEmitterInputQueue on messaging engine ProcessServerProdDepEnv.SupCluster.000-MONITOR.Cell01.Bus is not available because the high limit for the number of messages for this destination has already been reached...
   at com.ibm.ws.sib.comms.common.CommsByteBuffer.parseSingleException(CommsByteBuffer.java:1753)
   at com.ibm.ws.sib.comms.common.CommsByteBuffer.getException(CommsByteBuffer.java)
   at com.ibm.ws.sib.comms.common.CommsByteBuffer.checkXACommandCompletionStatus(CommsByteBuffer.java:1218)
   at com.ibm.ws.sib.comms.client.OptimizedSIXAResourceProxy.prepare(OptimizedSIXAResourceProxy.java:749)
   at com.ibm.ws.sib.comms.client.SuspendableXAResource.prepare(SuspendableXAResource.java:386)
   at com.ibm.ws.sib.api.jmsra.impl.JmsJcaRecoverableSiXaResource.prepare(JmsJcaRecoverableSiXaResource.java:260)
   at com.ibm.ejs.j2c.XATransactionWrapper.prepare(XATransactionWrapper.java:1152)
   at com.ibm.tx.jta.impl.JTAXAResourceImpl.prepare(JTAXAResourceImpl.java:234)
   at com.ibm.tx.jta.impl.RegisteredResources.prepareResource(RegisteredResources.java:1211)
   at com.ibm.tx.jta.impl.RegisteredResources.distributePrepare(RegisteredResources.java:1472)
   at com.ibm.tx.jta.impl.TransactionImpl.prepareResources(TransactionImpl.java:1488)
   at com.ibm.ws.tx.jta.TransactionImpl.stage1CommitProcessing(TransactionImpl.java:602)
   at com.ibm.tx.jta.impl.TransactionImpl.processCommit(TransactionImpl.java:1028)
   at com.ibm.tx.jta.impl.TransactionImpl.commit(TransactionImpl.java:962)
   at com.ibm.ws.tx.jta.TranManagerImpl.commit(TranManagerImpl.java:439)
   at com.ibm.tx.jta.impl.TranManagerSet.commit(TranManagerSet.java:191)
   at com.ibm.ws.uow.UOWManagerImpl.uowCommit(UOWManagerImpl.java:807)
   at com.ibm.ws.uow.embeddable.EmbeddableUOWManagerImpl.uowEnd(EmbeddableUOWManagerImpl.java:881)
   at com.ibm.ws.uow.UOWManagerImpl.uowEnd(UOWManagerImpl.java:782)
   at com.ibm.ws.uow.embeddable.EmbeddableUOWManagerImpl.runUnderNewUOW(EmbeddableUOWManagerImpl.java:818)
   at com.ibm.ws.uow.embeddable.EmbeddableUOWManagerImpl.runUnderUOW(EmbeddableUOWManagerImpl.java:370)
   at org.springframework.transaction.jta.WebSphereUowTransactionManager.execute(WebSphereUowTransactionManager.java:252)
   at com.lombardisoftware.utility.spring.ProgrammaticTransactionSupport.executeInNewTransaction(ProgrammaticTransactionSupport.java:431)
   at com.lombardisoftware.utility.spring.ProgrammaticTransactionSupport.execute(ProgrammaticTransactionSupport.java:294)
   at com.lombardisoftware.server.core.TXCommand.executeInDeadlockRetryLoop(TXCommand.java:83)
   at com.lombardisoftware.server.core.TXCommand.execute(TXCommand.java:72)
   at com.lombardisoftware.bpd.runtime.engine.quartz.AbstractBpdTask.execute(AbstractBpdTask.java:119)
   at com.lombardisoftware.bpd.runtime.engine.quartz.AbstractBpdTask.execute(AbstractBpdTask.java:71)
   at com.lombardisoftware.server.scheduler.Engine.execute(Engine.java:847)
...
Caused by: com.ibm.wsspi.sib.core.exception.SILimitExceededException: CWSIK0025E: The destination LombardiEventEmitterInputQueue on messaging engine AdvProcessServerProdDepEnv.SupCluster.000-MONITOR.rsdcpprobpmdm01Cell01.Bus is not available because the high limit for the number of messages for this destination has already been reached.
   at com.ibm.ws.sib.comms.common.CommsByteBuffer.parseSingleException(CommsByteBuffer.java:1842)
   ... 49 more

Part II - Resolving common event manager symptoms

A - Event manager is not processing any work

If the event manager does not process any work as mentioned previously in the symptoms section, this situation might be caused by the event manager not being active or being active, but blocked for some reason.

The following sections, A.1 and A.2, describe how to analyze and resolve these situations.

To monitor and diagnose the status of the event manager, the following pieces of information are helpful:

DB table LSW_EM_INSTANCE
DB table LSW_EM_TASK
DB table LSW_BLACKOUT_CALENDAR
Process Admin Console section Event Manager -> Monitor as GUI to 1. and 2.
Process Admin Console section Event Manager -> Blackout Periods as GUI to 3.
BPM server log file SystemOut.log and FFDCs
Event manager configuration file '80EventManager.xml', respectively and the global BPM server configuration file TeamWorksConfiguration.running.xml, which contains all of the parameters at run time. The following technote will show where to find these files and how they relate: http://www.ibm.com/support/docview.wss?uid=swg21439614

A.1 - Event manager not active

The easiest way to check the status of the event manager is by using the Process Admin Console, which shows the event manager status and lists all event manager jobs. It also lists the expiration time stamp, which is regularly updated every 15s by an internal heartbeat thread and set to the current time + 60s. Keep in mind that this time stamp is created based on the current time used on the database system that is hosting the BPM tables.

The Process Admin Console panel named 'Event Manager -> Monitor' is a graphical user interface for the contents of the database tables:

BPM DB table LSW_EM_INSTANCE containing the event manager status and heartbeat timestamp
BPM DB table LSW_EM_TASK containing all event manager jobs

The following screen shot shows the event manager status in the Process Admin Console:

Status:

Green - running
Red - stopped, not running

Each cluster member will have an event manager. If you have a multi-cluster topology, it is only present in the AppTarget servers. In this example, it is a single node cluster and, therefore, only one event manager instance is listed. In a clustered environment, there is an entry for every cluster member as a new row in the table. To process work, the event manager needs to be in the running state (green) and the time stamp listed in the Connect expiration field needs to be ahead of the current time.

A successful event manager start up is reflected in the server's SystemOut.log file with messages that show the start of the heartbeat thread and acquisition of the synchronous queues. You can grep the log for messages of the wle_scheduler group as shown here:

wle_scheduler I   CWLLG0561I: Heartbeat thread starting...
wle_scheduler I   CWLLG0615I: Heartbeat resumed.
wle_scheduler I   CWLLG0597I: Trying to acquire synchronous queue SYNC_QUEUE_1.
wle_scheduler I   CWLLG0581I: Acquired synchronous queue SYNC_QUEUE_1.
wle_scheduler I   CWLLG0597I: Trying to acquire synchronous queue SYNC_QUEUE_2.
wle_scheduler I   CWLLG0581I: Acquired synchronous queue SYNC_QUEUE_2.
wle_scheduler I   CWLLG0597I: Trying to acquire synchronous queue SYNC_QUEUE_3.
wle_scheduler I   CWLLG0581I: Acquired synchronous queue SYNC_QUEUE_3.

In the LSW_EM_INSTANCE table, the STATUS column shows a value of '1' for an active event manager and '2' if it has been paused.

With the Process Admin Console, the event manager can be paused and resumed by clicking the applicable button. Make sure to click Refresh to update the panel so that the current status is read from the LSW_EM_INSTANCE table.

Reasons why the event manager might be inactive:

a) The event manager configuration is corrupted by an incorrect attribute in the custom property file (for example: 100Custom.xml)
If you tried to overwrite event manager parameters by using a 100Custom.xml file and accidentally used the 'replace' attribute for merge instead of 'mergeChildren' as show below, the required parameters from 80Eventmanager.xml are not honored:

Incorrect definition sample:

replace

e="replace">true

art-paused>

Correct definition sample:

e="mergeChildren">

e="replace">true

art-paused>

With the incorrect definition sample, the Process Admin Console shows a NullPointerException when you click Event Manager > Monitor as shown in the following screen shot:

To solve that problem, correct the 100Custom.xml file and restart the server. After the restart, make sure that the TeamWorksConfiguration.running.xml file contains the complete

section and the event manager is shown as active in the Process Admin Console.

Note: There is a list of sample configuration files to adapt the IBM Business Process Manager configuration, including some samples for the event manager. You can access these files here.

b) The event manager might have been paused manually by using the Process Admin Console. In this case, you can resume its activity as mentioned previously. Even when it is paused, the connect expiration time stamp is renewed every 15s (default).

c) The event manager is configured to be started as "paused"
The 80EventManager.xml BPM server configuration file contains a parameter called

, which is set to false, by default. If it is configured to true by overwriting the parameter with a 100Custom.xml file, then the heartbeat thread to set the event manager 'connect expiration' is active, but the event manager will not process any work.

To check for that situation, look at your TeamWorksConfiguration.running.xml BPM server configuration file. Search in that file for the

string in the

section. If that parameter is set to true, this setting explains why the event manager did not become active after server start up.

In case the event manager is configured to be started as "paused," the SystemOut.log file will only contain the following messages during start up. They show that the heartbeat thread started and continuously updates the connect expiration time stamp, but the event manager did not acquire the synchronous queues.

wle_scheduler I   CWLLG0570I: Heartbeat paused.
wle_scheduler I   CWLLG0561I: Heartbeat thread starting...
wle_scheduler I   CWLLG0615I: Heartbeat resumed.

To resume the event manager, use the Process Admin Console as shown previously. A successful resume action will result in the following messages in the SystemOut.log file:

wle_scheduler I   CWLLG0615I: Heartbeat resumed.
wle_scheduler I   CWLLG0597I: Trying to acquire synchronous queue SYNC_QUEUE_1.
wle_scheduler I   CWLLG0581I: Acquired synchronous queue SYNC_QUEUE_1.
wle_scheduler I   CWLLG0597I: Trying to acquire synchronous queue SYNC_QUEUE_2.
wle_scheduler I   CWLLG0581I: Acquired synchronous queue SYNC_QUEUE_2.
wle_scheduler I   CWLLG0597I: Trying to acquire synchronous queue SYNC_QUEUE_3.
wle_scheduler I   CWLLG0581I: Acquired synchronous queue SYNC_QUEUE_3.

Keep in mind that the

parameter in the event manager configuration needs to be set back to false. Otherwise, it will still be inactive after the next server restart.

d) Event manager is not enabled in the configuration.
The 80EventManager.xml BPM server configuration file contains a parameter called enabled, which is set to true, by default.

To check for that situation, look at your TeamWorksConfiguration.running.xml BPM server configuration file. Search in that file for the parameter in the

section . If that parameter is set to false, then the event manager will not be active after the server start up. In contrast to being started as "paused," it will show none of the previous messages in the SystemOut.log file and the connect expiration time stamp will not be updated!

You cannot resume the event manager through the Process Admin Console in such a case. However, you need to change your configuration to set the enabled parameter back to true and restart your server.

e) Blackout period is active
Administrators establish blackout periods to specify times when events cannot be scheduled. For example, you might schedule a blackout period due to a holiday or for regular system maintenance windows. The event manager takes blackout periods into account when scheduling and queuing events, event subscriptions, and undercover agents (UCAs). The following screenshot shows if and which blackout periods are configured. This data is persisted in the LSW_BLACKOUT_CALENDAR DB table.

If a blackout period is active, the event manager monitor in the Process Admin Console lists a scheduled job named End blackout period where the scheduled time column shows when the blackout period ends. Event manager jobs created during the blackout period show a job status of Blacked out.

The following screenshot shows that scenario:

The SystemOut.log file does not show any applicable messages when the blackout period is entered.

f) Exceptions during the event manager start up
If the event manager is not running after start up, but was not configured as paused or disabled, the SystemOut.log file might show a couple of exceptions.

There could be various reasons why the event manager failed during startup or resume. Gather the documents as mentioned in the event manager mustgather technote.

The following section shows a few examples:

Event manager configuration is broken

This problem is caused by an incomplete fix pack installation where required post-installation steps to upgrade the profile were not executed.

The SystemOut.log file will have exceptions that have the following signatures.

CWLLG0144E: Exception in init(): schedule cannot be started. com.lombardisoftware.core.TeamWorksException: Message: SCHEDULER_CONFIG_BROKEN Arguments: loader-acquire-sync-queue-query: com.lombardisoftware.core.config.eventmanager.SchedulerConfig checkAndReplace Message: SCHEDULER_CONFIG_REPLACEMENT_PARAMETER_NOT_FOUND Arguments: %executing% loader-acquire-tasks-query UPDATE LSW_EM_TASK SET TASK_STATUS = %acquired%, TASK_OWNER = ? WHERE TASK_ID IN (%task-ids%)

To fix this problem, review the documented post-installation (interim fix/fix pack) steps and rerun the missing steps.
Event manager start up problem due to a problem in the BPM embedded document store (applies to IBM Business Process Manager V8.5 and later))

Important note: If the embedded BPM document store cannot be started due to configuration or authorization problems, the event manager will also not start!

The SystemOut.log file will not show any of the event manager-related start up messages as shown previously, but you will see, for example, the following exception, which is related to the embedded document store:

CWTDS1100E: An error occurred while validating or creating the default configuration for the IBM BPM document store.
                                 com.ibm.bpm.embeddedecm.exception.UserMissesWritePermissionException: CWTDS0022E: The configuration was changed in a way that the technical user 'deadmin' of the IBM BPM document store fails to change the object 'Domain'.
Explanation: The technical user defined in the BPM role type 'EmbeddedECMTechnicalUser' is not permitted to perform changes on an object.
Action: Revert the recent configuration changes. Ensure that the user defined by the BPM role type 'EmbeddedECMTechnicalUser' has access to the object. Verify this using the admin task 'getDocumentStoreStatus'.
    at com.ibm.bpm.embeddedecm.internal.DomainConfiguration$2.run(DomainConfiguration.java:264)
    at com.ibm.bpm.embeddedecm.internal.DomainConfiguration$2.run(DomainConfiguration.java:207)
    at java.security.AccessController.doPrivileged(AccessController.java:362)

To fix that problem, the configuration error with the document store must be resolved as shown here: http://www.ibm.com/support/docview.wss?uid=swg21673250

A.2 - Event manager is active but it is not processing any jobs

When the event manager is active (Process Admin Console shows it as active and connect expiration is not outdated) but is not processing any tasks, this could be caused by:

Event manager configuration file '80EventManager.xml', respectively the global BPM server configuration file TeamWorksConfiguration.running.xml, which contains all of the parameters at run time. The following technote will show where to find these files and how they relate: http://www.ibm.com/support/docview.wss?uid=swg21439614
Event manager blocked due to orphaned transactions in Microsoft SQLServer holding locks on its tables:
In case you use Microsoft SQLServer as the process server database, the reason for that could be so called 'orphaned transactions' in the DB system. The following TechNote will show how to resolve such a problem: http://www.ibm.com/support/docview.wss?uid=swg21633692
System time or timezone of BPM and remote DB system which is hosting the BPM DB is out of sync:To fix that, please make sure, that the system time on the BPM and the DB node are in sync. It is a best practice to have both on the same network time protocol server (NTP).

B - Event manager shows jobs with a scheduled date of 2099

If the execution of an event manager job fails, it is retried a couple of times as defined by the re-execute-limit configuration parameter (default = 5) in the 80EventManager.xml file. The behavior in such a case has gone through a fundamental change with APAR JR47860:

Pre JR47860 behaviour: when the re-execute-limit is reached, the according event manager job is discarded! There is no way to re-execute this job.
Post JR47860 behaviour: when the re-execute-limit is reached, the event manager job is rescheduled for 2099.

The interim fix for the APAR also provides a new administrative command called BPMReplayOnHoldEMTasks, which was introduced to resubmit this failed job. Check the APAR description for more details or the see the product documentation information in the IBM Knowledge Center.

Important note: Before resubmitting an event manager job, it is important to eliminate the root cause! Otherwise, you might run into the same problem again. To find the root cause, check your SystemOut.log file for message CWLLG0197W. This message indicates, that the event manager has tried to execute a task for 5 times but it failed. Note the thread ID and walk back in the thread history within the SystemOut.log file, which will most probably tell you which exception the execution of this event manager task failed.

Example for an event manager task to execute an UCA:
1. Search the SystemOut.log file for CWLLG0197W shows the following line - note thread ID 00011779.
[2/4/14 5:54:18:395 GMT] 00011779 wle_ucaexcept E   CWLLG0197W: Task Notify BPD 202738 of notification failed 5 times. The task will not be re-executed.

The previous messages for thread 00011779 will show this error message:
[2/4/14 5:54:18:337 GMT] 00011779 wle_ucaexcept E   CWLLG0181E: An exception occurred during execution of task 4,425,203. Error: PreparedStatementCallback; SQL [update LSW_BPD_INSTANCE_DATA set DATA = ? where BPD_INSTANCE_ID = ?]; Error for batch element #1: DB2 SQL Error: SQLCODE=-1476, SQLSTATE=40506, SQLERRMC=-968, DRIVER=3.61.65; nested exception is com.ibm.db2.jcc.am.SqlTransactionRollbackException: Error for batch element #1: DB2 SQL Error: SQLCODE=-1476, SQLSTATE=40506, SQLERRMC=-968, DRIVER=3.61.65
com.lombardisoftware.core.TeamWorksException: PreparedStatementCallback; SQL [update LSW_BPD_INSTANCE_DATA set DATA = ? where BPD_INSTANCE_ID = ?]; Error for batch         element #1: DB2 SQL Error: SQLCODE=-1476, SQLSTATE=40506, SQLERRMC=-968, DRIVER=3.61.65; nested exception is com.ibm.db2.jcc.am.SqlTransactionRollbackException: Error for batch element #1: DB2 SQL Error: SQLCODE=-1476, SQLSTATE=40506, SQLERRMC=-968, DRIVER=3.61.65
   at com.lombardisoftware.core.TeamWorksException.asTeamWorksException(TeamWorksException.java:130 ...

In this special case, the execution of the event manager task failed due to an SQL exception with sqlcode -968, which means that the database filesystem is out of space.

2. Fix the problem that caused the exception. In the previous example, resolve the out-of-space condition in the database filesystem.

3. Resubmit the applicable event manager task by using the BPMReplayOnHoldEMTasks command.

C,D,E,F - Event manager is active, but throughput problems exist

Throughput problems might be caused by a wide range of reasons. In terms of the event manager, the potential throughput is limited by the capacity of its queues.

For a comprehensive summary of all event manager-related configuration parameters including the different queues, check this product documentation in the IBM Knowledge Center.

To analyze and fix this problem, you need to understand the involved configuration parameters and how to monitor and adapt them.

a) Find out the event manager queue capacities
The event manager maintains a number of internal queues. The capacity of each queue is limited by a configuration parameter that is specified in the 80EventManager.xml configuration file and limits the number of jobs that can be in the execution state simultaneously. The following table shows the different queues, the applicable configuration parameter, and the default capacity (as of IBM Business Process Manager 8.5.5):

Event Manager Queue	Configuration Parameter in 80EventManager.xml	Default capacity
Async Queue(UCA)	async-queue-capacity	10
Sync Queue (UCA)	sync-queue-capacity	10
BPD Async Queue - BPD Notification - system lane tasks - timer execution	bpd-queue-capacity	40
System Queue	system-queue-capacity	10

The default values could have been overwritten by using a 100Custom.xml file. Then, find out which values are currently being used and have a look into TeamWorksConfiguration.running.xml file.

b) Determine the event manager queue usage and adapt the event manager queue sizes
To monitor the number of executing jobs on each event manager queue, use the Process Admin Console event manager monitor and count the number of rows for each 'Job Queue' with job status 'Executing'. Alternatively you could use this SQL statement:

SELECT COUNT(*) as EXECUTION_COUNT,
case QUEUE_ID
when '-100' then 'UCA Async Queue'
when '-101' then 'BPD Async Queue'
when '-102' then 'EM System Queue'
else 'UCA Sync Queue' END as QUEUE
from LSW_EM_TASK where TASK_STATUS = 3 group by QUEUE_ID WITH UR;

If the number of executing event manager tasks for a queue has reached the capacity limit and there are more tasks on that queue waiting to be executed (time to be scheduled has already passed), then there might be a performance problem or the queue capacity is too low for the workload and needs to be increased.

The BPD async queue is of special interest because its capacity is shared between the execution of system lane tasks, timer executions, and BPD notifications. If the complete capacity is already occupied by currently executing, long-running system lane tasks, no other job can be executed on that queue. The screen shot shown previously for Symptom C is an example from a system with bpd-queue-capacity set to 5 and the complete capacity is occupied by five executing system tasks. To eliminate a problem related to long running system tasks:

Find out why the system lane tasks have such a long execution time and try to fix that. There might be various reasons like back-end response time, excessive JVM garbage collection, CPU and memory constraints, network delays, and so on.
If the system lane tasks are expected to be long-running, think about splitting them into smaller pieces or increase the capacity of the BPD async queue as shown in the next paragraph.

c) Increase the event manager queue capacities

To increase the event manager queue sizes, specify the applicable parameter as shown in the previous table in a 100Custom.xml file. For example:

e="mergeChildren">
e="mergeChildren">

e="replace">40</bpd-queue-capacity>
      nc-queue-capacity merge="replace">10</bpd-queue-capacity>
      c-queue-capacity merge="replace">10</bpd-queue-capacity>
      tem-queue-capacity merge="replace">10</bpd-queue-capacity>

Important: When increasing the capacity of event manager queues, keep in mind that, besides using additional threads in your JVM, also additional JDBC connections will be needed. Thus, the JDBC data source (jdbc/TeamworksDB) connection pool also needs to be increased.
As a general rule, increase the number of database connections by two times the value by which you increased the queue capacity. Apart from database connections, also more JVM heap size is needed.

If there is a mismatch between the queue capacities and the number of available connections for the data source, IBM Business Process Manager tries to scale down the queue size. That issue will be indicated in the log by the warning messages shown under Symptom F.

d) Understand the event manager queue capacity and the related thread pool size
The event manager configuration also shows a parameter named max-thread-pool-size. By default, the value for the max-thread-pool-size parameter is the sum of the individual queue capacities (70). It is important to understand that its size does not limit the overall number of event manager tasks that can be executed simultaneously. So even if you set max-thread-pool-size to 5 and bpd-queue-capacity to 10, you will be able to execute 10 system lane tasks simultaneously. It is possible because the threadpool is defined as 'growable', which means it temporarily allows the number of threads to exceed the defined limit, but such a thread would be discarded directly after it finished and not be returned to the pool. Therefore, these threads are a bit more expensive.

Starting in IBM Business Process Manager 8.5.5.0, the event manager no longer uses its own internal thread pool. Instead, it uses a WebSphere Application Server work manager thread pool. This function is configured by these two event manager parameters:

-was-work-manager>true</use-was-work-manager>
-work-manager>wm/BPMEventManagerWorkManager</was-work-manager>

When you use the WebSphere Application Server work manager thread pool, the maximum pool size is configured in the WebSphere Application Server Administrative Console as shown in the following screen shot:

In a default configuration, the work manager thread pool for the event manager is defined with a maximum of 70 threads, but also as 'growable'. When sizing the work manager thread pool for the event manager, also make sure that its size is at least equal to the sum of queue capacities.

In case you modified the thread pool properties and removed the checkbox for "Growable", then the maximum number of threads implicity also limits the number of event manager jobs that can be executed simultaneously! See this screen shot.

One of the advantages of using a WebSphere Application Server work manager thread pool for the event manager is that you can use the Tivoli Performance Viewer. It is available from the WebSphere Application Server Administrative Console to monitor the thread pool activity. See this screen shot:

G - Event manager tasks fail when the LombardiEventEmitterInputQueue reaches the maximum threshold

Check the current queue depth and the threshold. You can use the service integration bus browser, which is integrated into the WebSphere Application Server Administrative Console, to easily check the queue depth and the high message threshold.

Make sure that there is a message consumer active to read from that queue, which would typically be the Business Monitor infrastructure. If it is not started, start the message consumer and the queue depth should decrease.

If the Business Monitor environment is started and consuming messages, but the queue depth is still at the limit, then perform tuning actions for the Business Monitor server or increase the high message threshold for the involved queues.

In case the Business Monitor environment is no longer existent, but you did not revert the BPM server configuration, there are still messages generated and put to the queue, but not consumed!

To solve it, you need to:

Pause the event manager so that no new messages are created.
Manually delete the message on the queue destination as shown below.
Disable the event emission on the IBM Business Process Manager server according to instructions below.
Restart your IBM Business Process Manager server

To delete the message from the queue destination using the WebSphere Application Server Administrative Console, complete these steps:

In the SIBUS section, navigate to the queue point for the LombardiEventEmitterInputQueue.
Select the runtime tab.
Click messages, which will display DeleteAll option to delete all of the messages on that queue.

The event emission for Business Monitor has been explicitly enabled by the following entry. To disable it, set the value for parameter 'enabled' to false as shown here:

itor-event-emission>
bled>false

nabled>
-authentication-alias>MonitorBusAuthc-authentication-alias>
itor-event-emission>

Part III - Known APAR related to the event manager

Issue, error, or problem	Adressed in APAR	Fix included in
Duplicate execution of event manager task under high load when using Oracle DB	JR49359	8.0.1.3, 8.5.5
Change handling of failed event manager tasks and introduction of admin command to replay these failed tasks	JR47860	8.0.1.3, 8.5.5
Posting message to event manager only starts TIP snapshot	JR45615 and JR45616	7.5.1.2, 8.0.1.1
Blackout calendar not respected for timer events	JR45899	8.0.1.2
Task processing threadpool initialized with wrong user, error message CWLLG0326E or CWLLG0179E	JR46484	8.0.1.2, 7.5.1.2, 8.5.5
IllegalStateException when starting/stopping teamworks.ear	JR47360	8.0.1.2, 8.5.0.1, 8.5.5
Delayed communication between BPD and Service engine	JR47915	8.0.1.2, 8.5.0.1, 8.5.5
DB2 error "bad SQL grammar" with DB2 9.5 after upgrading to 8.0.1.2 or installation of JR46470	JR48878	8.0.1.3
com.lombardisoftware.core.TeamWorksException: Numeric Overflow on event manager task	JR49172	8.5.5, 8.0.1.3
Double execution of event manager tasks in heavily loaded environments with Oracle DB	JR46470	8.0.1.2
UCA message corrupted when larger than 1000 Bytes and using multibyte characters	JR47265	8.0.1.2, 8.5.0.1, 8.5.5
UCA input/output parameters corrupted when containing unicode characters	JR46993	8.0.1.2, 8.5.0.1, 8.5.5
Cleanup of duplicate UCAs entries created before JR41966 had been applied	JR47574	8.0.1.2, 8.5.0.1, 7.5.1.2, 8.5.5
Time based UCAs disappear due to incomplete event manager task	JR50384	8.0.1.3
Scheduling a time elapsed UCA task causes exception when Oracle DB is used	JR46249	8.0.1.2
Time elapsed UCA not executed when schedule contains 'FIRST', 'LAST' or multiple weekdays selected	JR46122	7.5.1.2, 8.0.1.2
Time elapsed UCAs executed at wrong time when DB and process server in different timezone	JR43099	7.5.1.1
Time elapsed UCAs fired multiple times	JR41966	7.5.1.1
CWLLG0181E: Error: [ssage:com.lombardisoftware.server.scheduler.TaskDeath: Task killed by stopping scheduler at server stop or DB failover	JR49523	8.0.1.3, 8.5.5
Exception when using BPMReplayOnHoldEMTasks command and DB2 on z/OS is used	JR50490	8.0.1.3
THE PROCESS ADMIN EVENT MANAGER MONITOR PAGE IS VERY SLOW	JR48052	7.5.1.2, 8.0.1.3, 8.5.5

I Came,I Learned,I Experienced....

Wednesday, June 22, 2016

Limitations in IBM BPM document store

The document store must be enabled at configuration time for an IBM BPM installation that is configured to use DB2 on z/OS

The document store is only available when Federated Repositories is used as the user registry

An exception occurs when you are logging or tracing document store operations

Defining too many properties can exceed the table row size limit

Tuesday, May 24, 2016

BM Business Process Manager Event Manager - Common symptoms and Solutions