Thursday, April 10, 2014

WebSphere Application Server Performance Tuning Recommendations

Perform Proper Load Testing

  • Properly load testing your application is the most critical thing you can do to ensure a rock solid runtime in production.
  • Replicating your production environment isn’t always 100% necessary as most times you can get the same bang for your buck with a single representative machine in the environment
    • Calculate expected load across the cluster and divide down to single machine load
    • Drive load and perform the usual tuning loop to resolve the parameter set you need to tweak and tune.
    • Look at load on the database system, network, etc and extrapolate if it will support the full systems load and if not of if there are questions test
  • Performance testing needs to be representative of patterns that your application will actually be executing
  • Proper performance testing keeps track of and records key system level metrics as well as throughput metrics for reference later when changes to hardware or application are needed.
  • Always over stress your system.  Push the hardware and software to the max and find the breaking points. 
  • Only once you have done real world performance testing can you accurately size the complete set of hardware required to execute your application to meet your demand.

Correctly Tune The JVM

  • Correctly tuning the JVM in most cases will get you nearly 80% of the possible max performance of your application  
  • The big area to focus on for JVM tuning is heap size
    • Monitor verbose:gc and target GCing no more than once every 10 seconds with a max GC pause of a second or less.
    • Incremental testing is required to get this area right running with expected customer load on the system
    • Only after you have the above boundary layers met for GC do you want to start to experiment with differing garbage collection policies
  • Beyond the Heap Size settings most other parameters are to extract out max possible performance OR ensure that the JVM cooperates nicely on the system it is running on with other JVMs 
  • The Garbage Collector Memory Visualizer is an excellent tool tool for diagnosing GC issues or refining JVM performance tuning.
    • Provided as a downloadable plug-in within the IBM Support Assistant
Garbage Collection Memory Visualizer (GCMV)

Ensure Uniform Configuration Across Like Servers

  • Uniform configuration of software parameters and even operating systems is a common stumbling block
  • Most times manifests itself as a single machine or process that is burning more CPU, Memory or garbage collecting more frequently
  • Easiest way to manage this is to have a “dump configuration” script that runs periodically
  • Store the scripts results off and after each configuration change or application upgrade track differences
  • Leverage the Visual Configuration Explorer (VCE) tool available within ISA
Visual Configuration Explorer (VCE)

 Create Cells To Group Like Applications

  • Create Cells and Clusters of application servers with an express purpose that groups them in some manner
  • Large Cells (400-500-1000 members) for the most part while supported don’t make sense
  • Group applications that need to replicate data to each other or talk to each other via RMI, etc and create cells and clusters around those commonalities. 
  • Keeping cell size smaller leads to more efficient resource utilization due to less network traffic for configuration changes, DRS, HAManager, etc.
    • For example, core groups should be limited to no more than 40 to 50 instances
  • Smaller cells and logic grouping make migration forward to newer versions of products easier and more compartmentalized.
  
Tune JDBC Data Sources

  • Correct database connection pool tuning can yield significant gains in performance
  • This pool is highly contended in heavily multithreaded applications so ensuring significant available connections are in the pool leads to superior performance.
  • Monitor PMI metrics via TPV or others tools to watch for threads waiting on connections to the database as well as their wait time.
    • If threads are waiting increase the number of pooled connections in conjunction with your DBA OR decrease the number of active threads in the system
    • In some cases, a one-to-one mapping between DB connections and threads may be ideal
  • Frequently database deadlocks or bottlenecks first manifest themselves as a large number of threads from your thread pool waiting for connections
  • Always use the latest database driver for the database you are running as performance optimization in this space between versions are significant
  • Tune the Prepared Statement Cache Size for each JDBC data source
    • Can also be monitored via PMI/TPV to determine ideal value

Correctly Tune Thread Pools

  • Thread pools and their corresponding threads control all execution on the hardware threads.
  • Understand which thread pools your application uses and size all of them appropriately based on utilization you see in tuning exercises
    • Thread dumps, PMI metrics, etc will give you this data 
    • Thread Dump Memory Analyzer and Tivoli Performance viewer (TPV) will help in viewing this data.
  • Think of the thread pool as a queuing mechanism to throttle how many active requests you will have running at any one time in your application.
    • Apply the funnel based approach to sizing these pools
      • Example IHS (1000) -> WAS ( 50) -> WAS DB connection pool (30) -> 
      • Thread numbers above vary based on application characteristics
    • Since you can throttle active threads you can control concurrency through your codebase
  • Thread pools needs to be sized with the total number of hardware processor cores in mind
    • If sharing a hardware system with other WAS instances thread pools have to be tuned with that in mind.
    • You need to more than likely cut back on the number of threads active in the system to ensure good performance for all applications due to context switching at OS layer for each thread in the system
    • Sizing or restricting the max number of threads a application can have can sometimes be used to prevent rouge applications for impacting others.
  • Default sizes for WAS thread pools on v6.1 and above are actually a little to high for best performance
    • Two to one ratio (threads to cores) typically yields the best performance but this varies drastically between applications and access patterns
TPV & TDMA tool snapshots

Minimize HTTP Session Content

  • High performance data replication for application availability depends on correctly sized session data
    • Keep it under 1MB in all cases if possible
  • Only should be storing information critical to that users specific interaction with the server
  • If composite data is required build it progressively as the interaction occurs
    • Configure Session Replication in WAS to meet your needs
    • Use different configuration options (async vs. synch) to give you the availability your application needs without compromising response time.
    • Select the replication topology that works best for you (DB, M2M, M2M Server) 
    • Keep replication domains small and/or partition where possible


Understand and Tune Infrastructure (databases & other interactive server systems)

  • WebSphere Application Server and the system it runs on is typically only one part of the datacenter infrastructure and it has a good deal of reliance on other areas performing properly.Think of your infrastructure as a plumbing system. Optimal drain performance only occurs when no pipes are clogged. 
  • On the WAS system itself you need to be vary aware of
    • What other WAS instances (JVMs) are doing and their CPU / IO profiles
    • How much memory other WAS instance (or other OS’s in a virtualized case) are using
    • Network utilization of other applications coexisting on the same hardware
  • In the supporting infrastructure
    • Varying Network Latency can drastically effect split cell topologies, cross site data replication and database query latency
      • Ensure network infrastructure is repeatable and robust
      • Don’t take for granted bandwidth or latency before going into production always test as labs vary
    • Firewalls can cause issues with data transfer latencies between systems
  • On the database system
    • Ensure that proper indexes and tuning is done for the applications request patterns
    • Ensure that the database supports the number of connected clients your WAS runtime will have
    • Understand the CPU load and impacts of other applications (batch, OLTP, etc all competing with your applications)
  • On other application server systems or interactive server systems
    • Ensure performance of connected applications is up for the load being requested of it by the WAS system
    • Verify that developers have coded specific handling mechanisms for when connected applications go down (You need to avoid storm drain scenarios)

 Keep Application Logging to a Minimum 

  • Never should there be information outside of error cases being written to SystemOut.log
  • If using logging build your log messages only when needed
  • Good
    • if(loggingEnabled==true){ errorMsg = “This is a bad error” + “ “ + failingObject.printError();}
  • Bad 
    • errorMsg = “This is a bad error” + “ “ + failingObject.printError();
      If(loggingEnabled==true){ System.out.println(errorMsg); }
  • Keep error and log messages to the point and easy to debug
  • If using Apache Commons, Log4J, or other frameworks ensure performance on your system is as expected
  • Ensure if you must log information for audit purposes or other reasons that you are writing to a fast disk

Properly Tune the Operating System
  • Operating System is consistently overlooked for functional tuning as well as performance tuning.
  • Understand the hardware infrastructure backing your OS. Processor counts, speed, shared/unshared, etc
  • ulimit values need to be set correctly. Main player here is the number of open file handles (ulimit –n). Other process size and memory ones may need to be set based on application
  • Make sure NICs are set to full duplex and correct speeds
  • Large pages need to be enabled to take advantage of –Xlp JDK parametes
  • If enabled by default check RAS settings on OS and tune them down
  • Configure TCP/IP timeouts correctly for your applications needs
  • Depending on the load being placed on the system look into advanced tuning techniques such as pinning WAS processes via RSET or TASKSET as well as pinning IRQ interrupts
WAS Throughput with processor pinning