Search This Blog

2021-09-01

Apache Tomcat: Connection Pool Manager - Self Recover After DB Restart

Product: Apache Tomcat
Version: 6.x - 9.x
Ref: https://commons.apache.org/proper/commons-dbcp/configuration.html

Apache Tomcat 6 and earlier bundled with DBCP 1.3 connection pool manager for many years, which is readily to use by J2EE application as JNDI/DBCP connection pool manager. Tomcat 7.x onward start using its own lighter org.apache.tomcat.jdbc.pool connection pool manager (DBCP2 JAR is still bundled), but continue to use the same configuration parameters as DBCP.

This post going to discuss about non-DBCP connection pool manager, which is org.apache.tomcat.jdbc.pool.

One of the big challenge for many J2EE developers who use this connection pool manager assumes that their application will always get a good working DB connection from the connection pool manager when the DB is bounced.  In reality, they see that their Tomcat application will need to restart after DB maintenance (restart).  Many developers or administrators assume that, by default, the connection pool will auto close and open new DB connection when DB restarted, but it is not.  Some think that Tomcat will close them if let it idle for some time, but it won't close the broken connections.

This post is going to discuss several configurations that can configure connection pool manager to close existing broken DB connections and create new DB connection if DBA/anyone bounce the DB.

Among Apache Tomcat version in last 20+ years, it has connection pool parameters that can be configured without bouncing Tomcat (whenever DB bounced, e.g. Windows OS Update/patch).  Let's start with the simplified state diagram that I drawn by lookup Apache 9's connection pool manager.  It is more involving to create the UML state machine diagram which will be best, as I don't have any tool handy.  I tried to add as many Tomcat parameter as much into the diagram, yet to keep its size small, so it certainly won't contain the entire lifecycle of a connections in the pool


Note: Above diagram doesn't show states involving opening new connections from DB, which could experience long login, invalid password, account locked, remote DB down. On failure, it will not auto login to DB, if you are interested to know

Following are the parameters related to testing and dropping DB connections in the pool

1. testOnBorrow - Immediately re-connect all DB connections. Application will not see broken DB connections, and DB bounce is most likely transparent to the application. Default off

2. testWhileIdle +  minEvictableIdleTimeMillis - When connections in the pool meet long idle condition as defined by minEvictableIdleTimeMillis , it will drop all  (Tomcat 7+) existing broken long idle DB connections. Default off

3. testOnReturn - Continue to borrow broken DB connection to application, but immediately drop it if it is a broken DB connection. Application end will complain DB connection broken in its log, and not able to perform any DB operation. Application will not retry automatically. Default off

Following table illustrate more detail about its behavior:

# Method Application Will See Broken DB Conn Error? How Quickly Close Close One-by-One Close in Chunk CPU Usage
1 testOnBorrow No, if DB is up Immediate, and open new DB connections Yes No High
2 testWhileIdle Yes Controlled by:
timeBetweenEvictionRunsMillis (5 sec)
minEvictableIdleTimeMillis (60 sec)
maxIdle (same as maxActive, i.e. 100)
minIdle (same as initialSize, i.e. 10)
No Yes. Number of connection will be closed equals to value of  numTestsPerEvictionRun Low
3 testOnReturn Yes Immediate Yes No Medium

Note: for testWhileIdle setting, parameter numTestsPerEvictionRun is not used by Tomcat 7 onward (non-DBCP), so it will always closed all broken idle connections

For any of the above self recovered connection pool configuration, parameter "validationQuery" must specify a SELECT statement that is quick, and depends on DB type.  For example:

1. Oracle - select 1 from dual
2. MS SQL Server - select 1

I have encountered DB that could be slow to execute above SQL, and causing no available connections to borrow to the application, and stalled.  So developer/administrator must test running above SQL in  series + 1000 integration or more to measure the time taken as precaution.

If no SELECT statement is specified, DBCP won't auto close existing broken DB connections, and open a new one.

testOnBorrow and testWhileIdle are the most commonly configuration as there are generally 2 groups of people who favor high or low CPU resource.

For those who configure testWhileIdle, there are additional configuration to adjust it to the taste of the administrator to reduce CPU usage:

1. minIdle - default 10, which follows initialSize
2. maxIdle - default 100, which follows maxActive
3. numTestsPerEvictionRun - used by Tomcat 6 or older. Tomcat 7+ doesn't has this setting
4. minEvictableIdleTimeMillis - default 60 sec. The criteria for the DB connections in the connection pool to be considered "idle too long" and will be dropped, regardless they are broken or valid.  If connection not broken, then the connection could remains in the pool by keeping as many as 100 (minIdle)
5. timeBetweenEvictionRunsMillis - default 5 sec.  Eviction frequency (cycle or interval) that will drop broken DB connections in the connection pool

Tomcat 6 and older uses DBCP 1.3, which has additional numTestsPerEvictionRun to slowly drops long idle connections (to reduces CPU usage), so following are additional considerations for various versions:

1. How many broken connections in the pool that would like to close each time: Tomcat 7 will close all broken idle connections. numTestsPerEvictionRun (default 3 by DBCP 1.3 doc) is not used by Tomcat 7+, while Tomcat 6 or older will close 3 connections during each eviction cycle.
1.1. If application opened 1,000 DB connections when DB bounced, then there will be 1,000 broken DB connections in the connection pool.  This Tomcat 6 parameter means each round it will close 3 connections (default), and will take ~ 1,000 / 3 = 333 rounds to close all broken connections in the pool. For Tomcat 7+, it will close all 1000 broken connections (not closed 9990 and keeps 10 per minIdle)
1.2. A lower value will use less CPU as it close less broken connection each eviction round of eviction cycle
1.3. A higher value will use more CPU, so Tomcat 7+ will always use more CPU to close all broken connections
1.4. Broken connections get close one-by-one, even though it is in a batch as this is done by 1 Java eviction thread

2. How long a connection will be considered as "long idle" in order to close them: minEvictableIdleTimeMillis, default is 60 sec
2.1. Broken connection must be sitting in the connection pool without borrow to the application, and no user should use the application, or schedule any job, incoming web service call, background process, that will borrow those broken connection
2.2. If the broken connection not used for more or equals to 30 min, then it will be consider idle, and will pick up (per numTestsPerEvictionRun limit for Tomcat 6 or older, which is 3 idle connections only) by eviction thread to close the broken + idle connection
2.3. Even if a connection is idle more than 30 min (default), if in between the eviction cycle, the broken idle connection borrowed to application (due to batch process, background, or user trigger), then it will needs to wait for 30 min to be considered "idle." Sometimes, it will never idle as the broken connections keep borrow to the application due to user activity
2.4. A lower value will make broken connection to meet "idle" status sooner, but will use higher CPU
2.5. If a connection is idle for 29 min, borrowed to application for 1 sec (error out due to broken DB connection), idle for 29 min, borrow to app for 1 sec, and repeat 24 hr, then that specific connection will not consider "idle." This could be due to schedule batch process that fired every 15 min, or multiple that scheduled every 30 min (overlapping, so effectively < 30 min)

3. How frequent the eviction cycle will run and start closing idle (broken in this case) connections: timeBetweenEvictionRunsMillis, default 5 sec
3.1. A lower value will use more CPU, but will close broken connections that meet "long idle" status sooner (limit to 3 connections each round - numTestsPerEvictionRun).  If numTestsPerEvictionRun is 3, the CPU usage will be lower, but if numTestsPerEvictionRun is very high, and this cycle interval is high, then CPU usage will be very high
3.2. A higher cycle interval will close broken idle connections slower, even after they are in "idle" mode
3.3. If eviction cycle frequency (timeBetweenEvictionRunsMillis) is every 1 min (not applicable for Tomcat 7+) with default to close 3 connections (numTestsPerEvictionRun), then for 1000 broken DB connections, it will take 1000/3 * 1 min = 333 min (5 hr 33 sec) to close all
3.4. If eviction cycle is every 1 min with numTestsPerEvictionRun=100, to close 1000 broken DB connection, it will take 1000/100 * 1min = 10 min
3.5. If eviction cycle is every 5 sec with numTestsPerEvictionRun=10, to close 1000 broken DB connection, it will take 1000/10 * 5 sec = 500 sec (8 min 20 sec)
3.6. If eviction cycle is every 1 sec with numTestsPerEvictionRun=20, to close 1000 broken DB connection, it will take 1000/20 * 1 sec = 50 sec

Last warning is that even Tomcat DBCP2 or non-DBCP connection pool manager can self recovered by closing all the broken DB connections (either immediate, or with delay), this doesn't mean the J2EE application functionality will self recovered without human intervention. For example application functionality that won't self recovered:

1. Schedule job that failed to run earlier will not run again
2. Actively running schedule job will not self recovered after it abort when DB bounced
3. Background messaging/JMS might not self brought up
5. Incoming web services that failed due to loosing of DB connection will need to trigger again from remote server
6. End user might needs to login again and lost the information they entered into the screen

No comments: