Introduction
The event manager is the central component in IBM Business Process
Manager that is responsible for scheduling a number of different tasks.
If it's not working correctly, you might run into severe problems that
need to be resolved quickly.
This blog entry describes some of the most common symptoms and shows how to resolve them.
In Part I, some common event manager problems will be shown. Part II explains how to analyze and fix these problems. Part III lists the available APARs that are related to the event manager.
Special thx to Mark Filley and Bill Wentworth for their technical input and in depth review!
Part I - Common event manager symptoms
Symptom A - Event manager is not processing any work
The event manger is responsible for the scheduling of various jobs like:
-
Executing undercover agents (UCAs)
-
Executing system lane tasks
-
Triggering business process definition (BPD) timers
-
Scheduling BPD notifications, which are essential to move the process flow forward through the business process diagram (BPMN)
If your process instances are stuck, timers not fired, and UCAs are no
longer executed, then the event manager is not running or could be
blocked.
The Process Admin Console gives you a comprehensive view that shows the status of the event manager. When the status light is red, the event manager is paused or did not start and its jobs are accumulating with a Scheduled Time much behind the current time. In the Process Admin Console, these jobs would show 'Job Status' as 'Scheduled' and jobs would not be in the 'Executing' state.
For example:
The following screen shot, which was taken from the Process Admin
Console, shows the last event manager heart beat expiration time stamp
of "12/10/2014 2:00:15 PM." This time stamp is normally ahead of the
current time. The event manager job's (UCAs and BPD notifications) Scheduled Time
shows an earlier time stamp and a job is not currently executing. In
this example, the event manager is shown as inactive (red light), which
explains the situation.
Note: Even if the event manager
is not running, it is possible to start new process instances, but they
will not move forward! As services are not scheduled by the event
manager, those could also be executed.
Symptom B - Event manager shows jobs with a scheduled date of 2099
The Process Admin Console can show event manager jobs scheduled for 2099 as shown here:
Symptom C - Event manager is active, but long running system lane tasks block the event manager throughput
There can be situations where the event manager is actively working,
but you experience throughput problems. For example, the flow in the
process instances is not moving forward or the execution of timers is
delayed.
The following screen shot shows five system lane activities being
executed, but a couple of BPD notifications are waiting to be executed.
These BPD notifications are overdue as the 'Scheduled Time' is greater
than the current time. This situation can indicate that the event
manager configuration needs to be tuned and/or the execution time for
system lane tasks needs to be optimized, if possible.
Symptom D - UCAs are not processing at the desired rate
According to the definition in the process application, UCAs are bound
to a couple of synchronous queues or a single asynchronous queue managed
by the event manager. The capacity for these queues is defined by the
following parameters in the 80EventManager.xml configuration file:
nc-queue-capacity> or
c-queue-capacity>
Symptom E - Many BPD timers wake up at the same time
When the event manager processes a timer, it loads the applicable task
into the "BPD async queue," whose capacity is defined by the -queue-capacity> setting from the 80EventManager.xml
configuration file. If the application design has hundreds or thousands
of timers that start at the exact same time, then this setting might
need to be increased beyond the default of forty (40).
Symptom F - Event Manager warning messages CWLLG2156W, CWLLG2236W occur
If the BPM run time detects that the database connection pool is too
small, it will dynamically reduce the queue sizes and you will see
entries in SystemOut.log like the following messages:
"CWLLG2156W: The database connection pool size xxx of the Process Server data source might be too small." and/or
"CWLLG2236W: The configured <%%%%%%-queue-capacity> parameter of xxx has been changed to yyy."
These messages indicate that there is a mismatch between the event manager queue capacity and the JDBC data source pool size.
Symptom G - Event manager tasks fail when LombardiEventEmitterInputQueue reached max threshold
When you have your IBM Business Process Manager environment configured
to forward monitoring events to a Business Monitor server, the execution
of event manager tasks involves sending a message to the local queue
called "LombardiEventEmitterInputQueue." This queue maps to the JNDI name jms/com.ibm.lombardi/EventEmissionQueue.
If the queue depth of the LombardiEventEmitterInputQueue reaches
the configured maximum threshold, no more message can be put to this
queue and the execution of an event manager tasks will end up in an
exception like the following text:
J2CA0027E: An exception occurred while invoking prepare on an XA Resource Adapter from DataSource jms/com.ibm.lombardi/EventEmissionQueueFactory, within transaction ID {XidImpl: formatId(57415344), gtrid_length(36), bqual_length(54),
data(0000014ac680b3dd000000010c3c5a4c30653f6b06f16c1e5782cea7f4fce4b60a8f48d30000014ac680b3dd000000010c3c5a4c30653f6b06f16c1e5782cea7f4fce4b60a8f48d3000000010000000000000000000000000002)} : javax.transaction.xa.XAException:
CWSIC8007E: An exception was caught from the remote server with Probe
Id 3-013-0010. Exception: CWSIC2029E: This transaction cannot commit as
an operation that was performed within the transaction boundary failed.
The first operation that failed generated the following exception: com.ibm.ws.sib.processor.exceptions.SIMPLimitExceededException: CWSIK0025E: The destination LombardiEventEmitterInputQueue on messaging engine ProcessServerProdDepEnv.SupCluster.000-MONITOR.Cell01.Bus is not available because the high limit for the number of messages for this destination has already been reached...
at com.ibm.ws.sib.comms.common.CommsByteBuffer.parseSingleException(CommsByteBuffer.java:1753)
at com.ibm.ws.sib.comms.common.CommsByteBuffer.getException(CommsByteBuffer.java)
at com.ibm.ws.sib.comms.common.CommsByteBuffer.checkXACommandCompletionStatus(CommsByteBuffer.java:1218)
at com.ibm.ws.sib.comms.client.OptimizedSIXAResourceProxy.prepare(OptimizedSIXAResourceProxy.java:749)
at com.ibm.ws.sib.comms.client.SuspendableXAResource.prepare(SuspendableXAResource.java:386)
at com.ibm.ws.sib.api.jmsra.impl.JmsJcaRecoverableSiXaResource.prepare(JmsJcaRecoverableSiXaResource.java:260)
at com.ibm.ejs.j2c.XATransactionWrapper.prepare(XATransactionWrapper.java:1152)
at com.ibm.tx.jta.impl.JTAXAResourceImpl.prepare(JTAXAResourceImpl.java:234)
at com.ibm.tx.jta.impl.RegisteredResources.prepareResource(RegisteredResources.java:1211)
at com.ibm.tx.jta.impl.RegisteredResources.distributePrepare(RegisteredResources.java:1472)
at com.ibm.tx.jta.impl.TransactionImpl.prepareResources(TransactionImpl.java:1488)
at com.ibm.ws.tx.jta.TransactionImpl.stage1CommitProcessing(TransactionImpl.java:602)
at com.ibm.tx.jta.impl.TransactionImpl.processCommit(TransactionImpl.java:1028)
at com.ibm.tx.jta.impl.TransactionImpl.commit(TransactionImpl.java:962)
at com.ibm.ws.tx.jta.TranManagerImpl.commit(TranManagerImpl.java:439)
at com.ibm.tx.jta.impl.TranManagerSet.commit(TranManagerSet.java:191)
at com.ibm.ws.uow.UOWManagerImpl.uowCommit(UOWManagerImpl.java:807)
at com.ibm.ws.uow.embeddable.EmbeddableUOWManagerImpl.uowEnd(EmbeddableUOWManagerImpl.java:881)
at com.ibm.ws.uow.UOWManagerImpl.uowEnd(UOWManagerImpl.java:782)
at com.ibm.ws.uow.embeddable.EmbeddableUOWManagerImpl.runUnderNewUOW(EmbeddableUOWManagerImpl.java:818)
at com.ibm.ws.uow.embeddable.EmbeddableUOWManagerImpl.runUnderUOW(EmbeddableUOWManagerImpl.java:370)
at org.springframework.transaction.jta.WebSphereUowTransactionManager.execute(WebSphereUowTransactionManager.java:252)
at com.lombardisoftware.utility.spring.ProgrammaticTransactionSupport.executeInNewTransaction(ProgrammaticTransactionSupport.java:431)
at com.lombardisoftware.utility.spring.ProgrammaticTransactionSupport.execute(ProgrammaticTransactionSupport.java:294)
at com.lombardisoftware.server.core.TXCommand.executeInDeadlockRetryLoop(TXCommand.java:83)
at com.lombardisoftware.server.core.TXCommand.execute(TXCommand.java:72)
at com.lombardisoftware.bpd.runtime.engine.quartz.AbstractBpdTask.execute(AbstractBpdTask.java:119)
at com.lombardisoftware.bpd.runtime.engine.quartz.AbstractBpdTask.execute(AbstractBpdTask.java:71)
at com.lombardisoftware.server.scheduler.Engine.execute(Engine.java:847)
...
Caused by: com.ibm.wsspi.sib.core.exception.SILimitExceededException: CWSIK0025E: The destination LombardiEventEmitterInputQueue on messaging engine AdvProcessServerProdDepEnv.SupCluster.000-MONITOR.rsdcpprobpmdm01Cell01.Bus is not available because the high limit for the number of messages for this destination has already been reached.
at com.ibm.ws.sib.comms.common.CommsByteBuffer.parseSingleException(CommsByteBuffer.java:1842)
... 49 more
Part II - Resolving common event manager symptoms
A - Event manager is not processing any work
If the event manager does not process any work as mentioned previously
in the symptoms section, this situation might be caused by the event
manager not being active or being active, but blocked for some reason.
The following sections, A.1 and A.2, describe how to analyze and resolve these situations.
To monitor and diagnose the status of the event manager, the following pieces of information are helpful:
-
DB table LSW_EM_INSTANCE
-
DB table LSW_EM_TASK
-
DB table LSW_BLACKOUT_CALENDAR
-
Process Admin Console section Event Manager -> Monitor as GUI to 1. and 2.
-
Process Admin Console section Event Manager -> Blackout Periods as GUI to 3.
-
BPM server log file SystemOut.log and FFDCs
-
Event manager configuration file '80EventManager.xml', respectively and the global BPM server configuration file TeamWorksConfiguration.running.xml,
which contains all of the parameters at run time. The following
technote will show where to find these files and how they relate: http://www.ibm.com/support/docview.wss?uid=swg21439614
A.1 - Event manager not active
The easiest way to check the status of the event manager is by using
the Process Admin Console, which shows the event manager status and
lists all event manager jobs. It also lists the expiration time stamp,
which is regularly updated every 15s by an internal heartbeat thread and
set to the current time + 60s. Keep in mind that this time stamp is
created based on the current time used on the database system that is
hosting the BPM tables.
The Process Admin Console panel named 'Event Manager -> Monitor' is a
graphical user interface for the contents of the database tables:
-
BPM DB table LSW_EM_INSTANCE containing the event manager status and heartbeat timestamp
-
BPM DB table LSW_EM_TASK containing all event manager jobs
The following screen shot shows the event manager status in the Process Admin Console:
Status:
-
Green - running
-
Red - stopped, not running
Each cluster member will have an event manager. If you have a
multi-cluster topology, it is only present in the AppTarget servers. In
this example, it is a single node cluster and, therefore, only one event
manager instance is listed. In a clustered environment, there is an
entry for every cluster member as a new row in the table. To process
work, the event manager needs to be in the running state (green) and the
time stamp listed in the Connect expiration field needs to be ahead of
the current time.
A successful event manager start up is reflected in the server's SystemOut.log
file with messages that show the start of the heartbeat thread and
acquisition of the synchronous queues. You can grep the log for messages
of the wle_scheduler group as shown here:
wle_scheduler I CWLLG0561I: Heartbeat thread starting...
wle_scheduler I CWLLG0615I: Heartbeat resumed.
wle_scheduler I CWLLG0597I: Trying to acquire synchronous queue SYNC_QUEUE_1.
wle_scheduler I CWLLG0581I: Acquired synchronous queue SYNC_QUEUE_1.
wle_scheduler I CWLLG0597I: Trying to acquire synchronous queue SYNC_QUEUE_2.
wle_scheduler I CWLLG0581I: Acquired synchronous queue SYNC_QUEUE_2.
wle_scheduler I CWLLG0597I: Trying to acquire synchronous queue SYNC_QUEUE_3.
wle_scheduler I CWLLG0581I: Acquired synchronous queue SYNC_QUEUE_3.
In the LSW_EM_INSTANCE table, the STATUS column shows a value of '1' for an active event manager and '2' if it has been paused.
With the Process Admin Console, the event manager can be paused and
resumed by clicking the applicable button. Make sure to click Refresh to update the panel so that the current status is read from the LSW_EM_INSTANCE table.
Reasons why the event manager might be inactive:
a) The event manager configuration is corrupted by an incorrect attribute in the custom property file (for example: 100Custom.xml)
If you tried to overwrite event manager parameters by using a 100Custom.xml
file and accidentally used the 'replace' attribute for merge instead of
'mergeChildren' as show below, the required parameters from 80Eventmanager.xml are not honored:
Incorrect definition sample:
replace">
e="replace">true
|
art-
paus
ed>
Correct definition sample:
e="mergeChildren">
e="replace">true
art-
paus
ed>
With the incorrect definition sample, the Process Admin Console shows a Null
Poin
terE
xcep
tion when you click
Event Manager > Monitor as shown in the following screen shot:
To solve that problem, correct the
100Custom.xml file and restart the server. After the restart, make sure that the
TeamWorksConfiguration.running.xml file contains the complete
section and the event manager is shown as active in the Process Admin Console.
Note: There is a list of sample
configuration files to adapt the IBM Business Process Manager
configuration, including some samples for the event manager. You can
access these files
here.
b) The event manager might have been paused manually by using the
Process Admin Console. In this case, you can resume its activity as
mentioned previously. Even when it is paused, the connect expiration
time stamp is renewed every 15s (default).
c) The event manager is configured to be started as "paused"
The 80EventManager.xml BPM server configuration file contains a parameter called , which is set to false, by default. If it is configured to true by overwriting the parameter with a 100Custom.xml
file, then the heartbeat thread to set the event manager 'connect
expiration' is active, but the event manager will not process any work.
To check for that situation, look at your TeamWorksConfiguration.running.xml BPM server configuration file. Search in that file for the string in the section. If that parameter is set to true, this setting explains why the event manager did not become active after server start up.
In case the event manager is configured to be started as "paused," the SystemOut.log
file will only contain the following messages during start up. They
show that the heartbeat thread started and continuously updates the
connect expiration time stamp, but the event manager did not acquire the
synchronous queues.
wle_scheduler I CWLLG0570I: Heartbeat paused.
wle_scheduler I CWLLG0561I: Heartbeat thread starting...
wle_scheduler I CWLLG0615I: Heartbeat resumed.
To resume the event manager, use the Process Admin Console as shown
previously. A successful resume action will result in the following
messages in the SystemOut.log file:
wle_scheduler I CWLLG0615I: Heartbeat resumed.
wle_scheduler I CWLLG0597I: Trying to acquire synchronous queue SYNC_QUEUE_1.
wle_scheduler I CWLLG0581I: Acquired synchronous queue SYNC_QUEUE_1.
wle_scheduler I CWLLG0597I: Trying to acquire synchronous queue SYNC_QUEUE_2.
wle_scheduler I CWLLG0581I: Acquired synchronous queue SYNC_QUEUE_2.
wle_scheduler I CWLLG0597I: Trying to acquire synchronous queue SYNC_QUEUE_3.
wle_scheduler I CWLLG0581I: Acquired synchronous queue SYNC_QUEUE_3.
Keep in mind that the parameter in the event manager configuration needs to be set back to false. Otherwise, it will still be inactive after the next server restart.
d) Event manager is not enabled in the configuration.
The 80EventManager.xml BPM server configuration file contains a parameter called enabled, which is set to true, by default.
To check for that situation, look at your
TeamWorksConfiguration.running.xml BPM server configuration file. Search in that file for the
parameter in the
section . If that parameter is set to
false,
then the event manager will not be active after the server start up. In
contrast to being started as "paused," it will show none of the
previous messages in the
SystemOut.log file and the connect expiration time stamp will not be updated!
You cannot resume the event manager through the Process Admin Console
in such a case. However, you need to change your configuration to set
the
enabled parameter back to
true and restart your server.
e) Blackout period is active
Administrators establish blackout periods to specify times when events
cannot be scheduled. For example, you might schedule a blackout period
due to a holiday or for regular system maintenance windows. The event
manager takes blackout periods into account when scheduling and queuing
events, event subscriptions, and undercover agents (UCAs). The following
screenshot shows if and which blackout periods are configured. This
data is persisted in the
LSW_BLACKOUT_CALENDAR DB table.
If a blackout period is active, the event manager monitor in the Process Admin Console lists a scheduled job named
End blackout period
where the scheduled time column shows when the blackout period ends.
Event manager jobs created during the blackout period show a job status
of
Blacked out.
The following screenshot shows that scenario:
The
SystemOut.log file does not show any applicable messages when the blackout period is entered.
f) Exceptions during the event manager start up
If the event manager is not running after start up, but was not configured as paused or disabled, the
SystemOut.log file might show a couple of exceptions.
There could be various reasons why the event manager failed during
startup or resume. Gather the documents as mentioned in the event
manager mustgather
technote.
The following section shows a few examples:
-
Event manager configuration is broken
This problem is caused by an incomplete fix pack installation where
required post-installation steps to upgrade the profile were not
executed.
The SystemOut.log file will have exceptions that have the following signatures.
CWLLG0144E: Exception in init(): schedule cannot be started. com.lombardisoftware.core.TeamWorksException: Message: SCHEDULER_CONFIG_BROKEN Arguments: loader-acquire-sync-queue-query: com.lombardisoftware.core.config.eventmanager.SchedulerConfig checkAndReplace Message: SCHEDULER_CONFIG_REPLACEMENT_PARAMETER_NOT_FOUND Arguments: %executing% loader-acquire-tasks-query UPDATE LSW_EM_TASK SET TASK_STATUS = %acquired%, TASK_OWNER = ? WHERE TASK_ID IN (%task-ids%)
To fix this problem, review the documented post-installation (interim fix/fix pack) steps and rerun the missing steps.
-
Event manager start up problem due to a problem in the BPM embedded
document store (applies to IBM Business Process Manager V8.5 and later))
Important note: If the embedded BPM document store
cannot be started due to configuration or authorization problems, the
event manager will also not start!
The SystemOut.log
file will not show any of the event manager-related start up messages
as shown previously, but you will see, for example, the following
exception, which is related to the embedded document store:
CWTDS1100E: An error occurred while validating or creating the default configuration for the IBM BPM document store.
com.ibm.bpm.embeddedecm.exception.UserMissesWritePermissionException:
CWTDS0022E: The configuration was changed in a way that the technical
user 'deadmin' of the IBM BPM document store fails to change the object
'Domain'.
Explanation: The technical user defined in the BPM role type 'EmbeddedECMTechnicalUser' is not permitted to perform changes on an object.
Action: Revert the recent configuration changes. Ensure that the user defined by the BPM role type 'EmbeddedECMTechnicalUser' has access to the object. Verify this using the admin task 'getDocumentStoreStatus'.
at com.ibm.bpm.embeddedecm.internal.DomainConfiguration$2.run(DomainConfiguration.java:264)
at com.ibm.bpm.embeddedecm.internal.DomainConfiguration$2.run(DomainConfiguration.java:207)
at java.security.AccessController.doPrivileged(AccessController.java:362)
To fix that problem, the configuration error with the document store must be resolved as shown here:
http://www.ibm.com/support/docview.wss?uid=swg21673250
A.2 - Event manager is active but it is not processing any jobs
When the event manager is active (Process Admin Console shows it as
active and connect expiration is not outdated) but is not processing any
tasks, this could be caused by:
-
Event manager configuration file '80EventManager.xml', respectively the global BPM server configuration file TeamWorksConfiguration.running.xml, which
contains all of the parameters at run time. The following technote will
show where to find these files and how they relate: http://www.ibm.com/support/docview.wss?uid=swg21439614
-
Event manager blocked due to orphaned transactions in Microsoft SQLServer holding locks on its tables:
In case you use Microsoft SQLServer as the process server database,
the reason for that could be so called 'orphaned transactions' in the DB
system. The following TechNote will show how to resolve such a problem:
http://www.ibm.com/support/docview.wss?uid=swg21633692
-
System time or timezone of BPM and remote DB system which is hosting
the BPM DB is out of sync:To fix that, please make sure, that the system
time on the BPM and the DB node are in sync. It is a best practice to
have both on the same network time protocol server (NTP).
B - Event manager shows jobs with a scheduled date of 2099
If the execution of an event manager job fails, it is retried a couple of times as defined by the re-execute-limit configuration parameter (default = 5) in the 80EventManager.xml file. The behavior in such a case has gone through a fundamental change with APAR JR47860:
-
Pre JR47860 behaviour: when the re-execute-limit is reached, the according event manager job is discarded! There is no way to re-execute this job.
-
Post JR47860 behaviour: when the re-execute-limit is reached, the event manager job is rescheduled for 2099.
The interim fix for the APAR also provides a new administrative command called
BPMReplayOnHoldEMTasks, which was introduced to resubmit this failed job. Check the
APAR description for more details or the see the
product documentation information in the IBM Knowledge Center.
Important note: Before
resubmitting an event manager job, it is important to eliminate the root
cause! Otherwise, you might run into the same problem again. To find
the root cause, check your
SystemOut.log file for message
CWLLG0197W.
This message indicates, that the event manager has tried to execute a
task for 5 times but it failed. Note the thread ID and walk back in the
thread history within the
SystemOut.log file, which will most probably tell you which exception the execution of this event manager task failed.
Example for an event manager task to execute an UCA:
1. Search the
SystemOut.log file for
CWLLG0197W shows the following line - note thread ID
00011779.
[2/4/14 5:54:18:395 GMT] 00011779 wle_ucaexcept E CWLLG0197W: Task Notify BPD 202738 of notification failed 5 times. The task will not be re-executed.
The
previous messages for thread
00011779 will show this error message:
[2/4/14 5:54:18:337 GMT] 00011779 wle_ucaexcept E CWLLG0181E: An exception occurred during execution of task 4,425,203. Error: PreparedStatementCallback; SQL [update LSW_BPD_INSTANCE_DATA
set DATA = ? where BPD_INSTANCE_ID = ?]; Error for batch element #1:
DB2 SQL Error: SQLCODE=-1476, SQLSTATE=40506, SQLERRMC=-968,
DRIVER=3.61.65; nested exception is com.ibm.db2.jcc.am.SqlTransactionRollbackException: Error for batch element #1: DB2 SQL Error: SQLCODE=-1476, SQLSTATE=40506, SQLERRMC=-968, DRIVER=3.61.65
com.lombardisoftware.core.TeamWorksException: PreparedStatementCallback; SQL [update LSW_BPD_INSTANCE_DATA
set DATA = ? where BPD_INSTANCE_ID = ?]; Error for batch
element #1: DB2 SQL Error: SQLCODE=-1476, SQLSTATE=40506, SQLERRMC=-968,
DRIVER=3.61.65; nested exception is com.ibm.db2.jcc.am.SqlTransactionRollbackException: Error for batch element #1: DB2 SQL Error: SQLCODE=-1476, SQLSTATE=40506, SQLERRMC=-968, DRIVER=3.61.65
at com.lombardisoftware.core.TeamWorksException.asTeamWorksException(TeamWorksException.java:130 ...
In this special case, the execution of the event manager task failed due to an SQL exception with
sqlcode -968, which means that the database filesystem is out of space.
2. Fix the problem that caused the exception. In the previous example,
resolve the out-of-space condition in the database filesystem.
3. Resubmit the applicable event manager task by using the
BPMReplayOnHoldEMTasks command.
C,D,E,F - Event manager is active, but throughput problems exist
Throughput problems might be caused by a wide range of reasons. In
terms of the event manager, the potential throughput is limited by the
capacity of its queues.
For a comprehensive summary of all event manager-related configuration parameters including the different queues, check this
product documentation in the IBM Knowledge Center.
To analyze and fix this problem, you need to understand the involved
configuration parameters and how to monitor and adapt them.
a) Find out the event manager queue capacities
The event manager maintains a number of internal queues. The capacity
of each queue is limited by a configuration parameter that is specified
in the 80EventManager.xml configuration
file and limits the number of jobs that can be in the execution state
simultaneously. The following table shows the different queues, the
applicable configuration parameter, and the default capacity (as of IBM
Business Process Manager 8.5.5):
Event Manager Queue |
Configuration Parameter in 80EventManager.xml |
Default capacity |
Async Queue(UCA) |
async-queue-capacity |
10 |
Sync Queue (UCA) |
sync-queue-capacity |
10 |
BPD Async Queue
- BPD Notification
- system lane tasks
- timer execution
|
bpd-queue-capacity |
40 |
System Queue |
system-queue-capacity |
10 |
The default values could have been overwritten by using a 100Custom.xml file. Then, find out which values are currently being used and have a look into TeamWorksConfiguration.running.xml file.
b) Determine the event manager queue usage and adapt the event manager queue sizes
To monitor the number of executing jobs on each event manager queue,
use the Process Admin Console event manager monitor and count the number
of rows for each 'Job Queue' with job status 'Executing'. Alternatively
you could use this SQL statement:
SELECT COUNT(*) as EXECUTION_COUNT,
case QUEUE_ID
when '-100' then 'UCA Async Queue'
when '-101' then 'BPD Async Queue'
when '-102' then 'EM System Queue'
else 'UCA Sync Queue' END as QUEUE
from LSW_EM_TASK where TASK_STATUS = 3 group by QUEUE_ID WITH UR;
If the number of executing event manager tasks for a queue has reached
the capacity limit and there are more tasks on that queue waiting to be
executed (time to be scheduled has already passed), then there might be a
performance problem or the queue capacity is too low for the workload
and needs to be increased.
The BPD async queue is of special interest because its capacity is
shared between the execution of system lane tasks, timer executions, and
BPD notifications. If the complete capacity is already occupied by
currently executing, long-running system lane tasks, no other job can be
executed on that queue. The screen shot shown previously for Symptom C
is an example from a system with bpd-queue-capacity
set to 5 and the complete capacity is occupied by five executing system
tasks. To eliminate a problem related to long running system tasks:
-
Find out why the system lane tasks have such a long execution time and
try to fix that. There might be various reasons like back-end response
time, excessive JVM garbage collection, CPU and memory constraints,
network delays, and so on.
-
If the system lane tasks are expected to be long-running, think about
splitting them into smaller pieces or increase the capacity of the BPD
async queue as shown in the next paragraph.
c) Increase the event manager queue capacities
To increase the event manager queue sizes, specify the applicable parameter as shown in the previous table in a
100Custom.xml file. For example:
e="mergeChildren">
e="mergeChildren">
e="replace">40</bpd-queue-capacity>
nc-queue-capacity merge="replace">10</bpd-queue-capacity>
c-queue-capacity merge="replace">10</bpd-queue-capacity>
tem-queue-capacity merge="replace">10</bpd-queue-capacity>
Important: When increasing the capacity of event
manager queues, keep in mind that, besides using additional threads in
your JVM, also additional JDBC connections will be needed. Thus, the
JDBC data source (
jdbc/TeamworksDB) connection pool also needs to be increased.
As a general rule, increase the number of database connections by two
times the value by which you increased the queue capacity. Apart from
database connections, also more JVM heap size is needed.
If there is a mismatch between the queue capacities and the number of
available connections for the data source, IBM Business Process Manager
tries to scale down the queue size. That issue will be indicated in the
log by the warning messages shown under Symptom F.
d) Understand the event manager queue capacity and the related thread pool size
The event manager configuration also shows a parameter named max-thread-pool-size. By default, the value for the max-thread-pool-size
parameter is the sum of the individual queue capacities (70). It is
important to understand that its size does not limit the overall number
of event manager tasks that can be executed simultaneously. So even if
you set max-thread-pool-size to 5 and bpd-queue-capacity
to 10, you will be able to execute 10 system lane tasks simultaneously.
It is possible because the threadpool is defined as 'growable', which
means it temporarily allows the number of threads to exceed the defined
limit, but such a thread would be discarded directly after it finished
and not be returned to the pool. Therefore, these threads are a bit more
expensive.
Starting in IBM Business Process Manager 8.5.5.0, the event manager no
longer uses its own internal thread pool. Instead, it uses a WebSphere
Application Server work manager thread pool. This function is configured
by these two event manager parameters:
-
-
-work-manager>wm/BPMEventManagerWorkManager</was-work-manager>
When you use the WebSphere Application Server work manager thread pool,
the maximum pool size is configured in the WebSphere Application Server
Administrative Console as shown in the following screen shot:
In a default configuration, the work manager thread pool for the event
manager is defined with a maximum of 70 threads, but also as 'growable'.
When sizing the work manager thread pool for the event manager, also
make sure that its size is at least equal to the sum of queue
capacities.
In case you modified the thread pool properties and removed the
checkbox for "Growable", then the maximum number of threads implicity
also limits the number of event manager jobs that can be executed
simultaneously! See this screen shot.
One of the advantages of using a WebSphere Application Server work
manager thread pool for the event manager is that you can use the Tivoli
Performance Viewer. It is available from the WebSphere Application
Server Administrative Console to monitor the thread pool activity. See
this screen shot:
G - Event manager tasks fail when the LombardiEventEmitterInputQueue reaches the maximum threshold
Check the current queue depth and the threshold. You can use the
service integration bus browser, which is integrated into the WebSphere
Application Server Administrative Console, to easily check the queue
depth and the high message threshold.
Make sure that there is a message consumer active to read from that
queue, which would typically be the Business Monitor infrastructure. If
it is not started, start the message consumer and the queue depth should
decrease.
If the Business Monitor environment is started and consuming messages,
but the queue depth is still at the limit, then perform tuning actions
for the Business Monitor server or increase the high message threshold
for the involved queues.
In case the Business Monitor environment is no longer existent, but you
did not revert the BPM server configuration, there are still messages
generated and put to the queue, but not consumed!
To solve it, you need to:
-
Pause the event manager so that no new messages are created.
-
Manually delete the message on the queue destination as shown below.
-
Disable the event emission on the IBM Business Process Manager server according to instructions below.
-
Restart your IBM Business Process Manager server
To delete the message from the queue destination using the WebSphere
Application Server Administrative Console, complete these steps:
-
In the SIBUS section, navigate to the queue point for the LombardiEventEmitterInputQueue.
-
Select the runtime tab.
-
Click messages, which will display DeleteAll option to delete all of the messages on that queue.
The event emission for Business Monitor has been explicitly enabled by
the following entry. To disable it, set the value for parameter
'enabled' to false as shown here:
itor-event-emission>
bled>false
nabl
ed>
-authentication-alias>MonitorBusAuthc-au
then
tica
tion
-ali
as>
itor-event-emission>
Part III - Known APAR related to the event manager
Issue, error, or problem |
Adressed in APAR |
Fix included in |
Duplicate execution of event manager task under high load when using Oracle DB |
JR49359 |
8.0.1.3, 8.5.5 |
Change handling of failed event manager tasks and introduction of admin command to replay these failed tasks |
JR47860 |
8.0.1.3, 8.5.5 |
Posting message to event manager only starts TIP snapshot |
JR45615 and JR45616 |
7.5.1.2, 8.0.1.1 |
Blackout calendar not respected for timer events |
JR45899 |
8.0.1.2 |
Task processing threadpool initialized with wrong user, error message CWLLG0326E or CWLLG0179E |
JR46484 |
8.0.1.2, 7.5.1.2, 8.5.5 |
IllegalStateException when starting/stopping teamworks.ear |
JR47360 |
8.0.1.2, 8.5.0.1, 8.5.5 |
Delayed communication between BPD and Service engine |
JR47915 |
8.0.1.2, 8.5.0.1, 8.5.5 |
DB2 error "bad SQL grammar" with DB2 9.5 after upgrading to 8.0.1.2 or installation of JR46470 |
JR48878 |
8.0.1.3 |
com.lombardisoftware.core.TeamWorksException: Numeric Overflow on event manager task |
JR49172 |
8.5.5, 8.0.1.3 |
Double execution of event manager tasks in heavily loaded environments with Oracle DB |
JR46470 |
8.0.1.2 |
UCA message corrupted when larger than 1000 Bytes and using multibyte characters |
JR47265 |
8.0.1.2, 8.5.0.1, 8.5.5 |
UCA input/output parameters corrupted when containing unicode characters |
JR46993 |
8.0.1.2, 8.5.0.1, 8.5.5 |
Cleanup of duplicate UCAs entries created before JR41966 had been applied |
JR47574 |
8.0.1.2, 8.5.0.1, 7.5.1.2, 8.5.5 |
Time based UCAs disappear due to incomplete event manager task |
JR50384 |
8.0.1.3 |
Scheduling a time elapsed UCA task causes exception when Oracle DB is used |
JR46249 |
8.0.1.2 |
Time elapsed UCA not executed when schedule contains 'FIRST', 'LAST' or multiple weekdays selected |
JR46122 |
7.5.1.2, 8.0.1.2 |
Time elapsed UCAs executed at wrong time when DB and process server in different timezone |
JR43099 |
7.5.1.1 |
Time elapsed UCAs fired multiple times |
JR41966 |
7.5.1.1 |
CWLLG0181E: Error: [ssage:com.lombardisoftware.server.scheduler.TaskDeath: Task killed by stopping scheduler at server stop or DB failover |
JR49523 |
8.0.1.3, 8.5.5 |
Exception when using BPMReplayOnHoldEMTasks command and DB2 on z/OS is used |
JR50490 |
8.0.1.3 |
THE PROCESS ADMIN EVENT MANAGER MONITOR PAGE IS VERY SLOW |
JR48052 |
7.5.1.2, 8.0.1.3, 8.5.5 |