记一次线上应用数据库连接池满的处理

http://it.nfsq.com.cn/index.php/2016/04/06/online_problem_handle_dubbo/

首先看到dubbo-claim应用突然大面积报错,基本反馈是冻结预算出问题了,看了看冻结预算的代码,发现写的非常复杂,果断放弃看代码来排查问题。
Could not open jdbc connection for transaction: Pool empty. Unable to fetch a connection in 10 seconds, none available[size:20,busy:20; idle:0, lastwait:10000],
报错中能获得的信息是数据库的连接池满了。因为是突然报错,所以排查的主要两个方向:

  1. 调用量增大导致响应慢
  2. 数据库数据量增大导致响应慢

其他可能的小概率事件:

  1. 数据库连接未正常关闭
  2. 新上的功能比较耗时拖慢数据库
  3. 数据库有异常,或者机器网络等硬件问题(概率极小,不到最后不要想)
  4. 数据库连接是否配置小了?(20*4台一共80。假设平均处理时间在200ms,能够支撑400qps,足够了)

接下来就需要验证猜测,第一反应想看看jvm里的线程都干嘛呢,到底是谁在拖后腿,用 jstack 来打印出线程的活动状态,我是用jvisualVM来直接 dump 线程。
接下来就是看日志中的线程都干嘛呢,首先看状态为 BLOCK,和 WAITING 状态的线程,看他们是因为什么导致不执行。

"DubboServerHandler-10.3.0.87:20888-thread-152" - Thread t@229
   java.lang.Thread.State: BLOCKED
    at com.nfsq.xs.claim.biz.budget.BudgetService.freezeBudget(BudgetService.java:85)
    - waiting to lock <7153f861> (a com.nfsq.xs.claim.biz.budget.BudgetService) owned by "DubboServerHandler-10.3.0.87:20888-thread-176" t@253
    at com.nfsq.xs.claim.biz.budget.BudgetService$$FastClassBySpringCGLIB$$8dee689c.invoke(<generated>)
    at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)

接下来发现日志里一个非常重要信息线程152在等线程176的一个锁(7153f861),且其他很多线程也是在等待这把锁 。
接下来看线程176这家伙拿着这把锁干嘛呢

"DubboServerHandler-10.3.0.87:20888-thread-176" - Thread t@253
   java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:170)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114)
    at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161)
    at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:189)
    - locked <776bc7e0> (a com.mysql.jdbc.util.ReadAheadInputStream)
    at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3014)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3467)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719)
    - locked <54dd815d> (a com.mysql.jdbc.JDBC4Connection)
    at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
    - locked <54dd815d> (a com.mysql.jdbc.JDBC4Connection)
    at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1379)
    - locked <54dd815d> (a com.mysql.jdbc.JDBC4Connection)
    at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:56)
    at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:70)
    at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:57)
    at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:267)
    at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:141)
    at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:105)
    at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:81)
    at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:101)
    at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:95)
    at sun.reflect.GeneratedMethodAccessor215.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:355)
    at com.sun.proxy.$Proxy21.selectList(Unknown Source)
    at org.mybatis.spring.SqlSessionTemplate.selectList(SqlSessionTemplate.java:195)
    at org.apache.ibatis.binding.MapperMethod.executeForMany(MapperMethod.java:124)
    at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:90)
    at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:40)
    at com.sun.proxy.$Proxy32.listActivityWalletDTO(Unknown Source)
    at com.nfsq.xs.claim.service.budget.ActivityWalletOperationService.listActivityWalletDTO(ActivityWalletOperationService.java:363)
    at com.nfsq.xs.claim.biz.budget.BudgetService.signContract(BudgetService.java:610)
    at com.nfsq.xs.claim.biz.budget.BudgetService.freezeBudget(BudgetService.java:95)
    - locked <7153f861> (a com.nfsq.xs.claim.biz.budget.BudgetService)

看到了线程176拿了锁后最后一步执行的操作

ActivityWalletOperationService.listActivityWalletDTO

去看了代码发现对应的 SQL:

select * from ACTIVITY_WALLET
 where 
    source_id = '240'
    and source = '1';

这么简单的 SQL ,去数据看了数据量28万条,一看索引竟然只有 id 索引,source_id 查询走的全表扫描。找少炳来加了索引后,应用报错全部都没了,持续观察了两天问题解决。
最后在这里感谢@张少炳,@赵统国 @凡小梦 @马瑾 提供的帮助。

相关推荐