记一次线上应用数据库连接池满的处理
http://it.nfsq.com.cn/index.php/2016/04/06/online_problem_handle_dubbo/
首先看到dubbo-claim应用突然大面积报错,基本反馈是冻结预算出问题了,看了看冻结预算的代码,发现写的非常复杂,果断放弃看代码来排查问题。
Could not open jdbc connection for transaction: Pool empty. Unable to fetch a connection in 10 seconds, none available[size:20,busy:20; idle:0, lastwait:10000],
报错中能获得的信息是数据库的连接池满了。因为是突然报错,所以排查的主要两个方向:
- 调用量增大导致响应慢
- 数据库数据量增大导致响应慢
其他可能的小概率事件:
- 数据库连接未正常关闭
- 新上的功能比较耗时拖慢数据库
- 数据库有异常,或者机器网络等硬件问题(概率极小,不到最后不要想)
- 数据库连接是否配置小了?(20*4台一共80。假设平均处理时间在200ms,能够支撑400qps,足够了)
接下来就需要验证猜测,第一反应想看看jvm里的线程都干嘛呢,到底是谁在拖后腿,用 jstack 来打印出线程的活动状态,我是用jvisualVM来直接 dump 线程。
接下来就是看日志中的线程都干嘛呢,首先看状态为 BLOCK,和 WAITING 状态的线程,看他们是因为什么导致不执行。
"DubboServerHandler-10.3.0.87:20888-thread-152" - Thread t@229 java.lang.Thread.State: BLOCKED at com.nfsq.xs.claim.biz.budget.BudgetService.freezeBudget(BudgetService.java:85) - waiting to lock <7153f861> (a com.nfsq.xs.claim.biz.budget.BudgetService) owned by "DubboServerHandler-10.3.0.87:20888-thread-176" t@253 at com.nfsq.xs.claim.biz.budget.BudgetService$$FastClassBySpringCGLIB$$8dee689c.invoke(<generated>) at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
接下来发现日志里一个非常重要信息线程152在等线程176的一个锁(7153f861),且其他很多线程也是在等待这把锁 。
接下来看线程176这家伙拿着这把锁干嘛呢
"DubboServerHandler-10.3.0.87:20888-thread-176" - Thread t@253 java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114) at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161) at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:189) - locked <776bc7e0> (a com.mysql.jdbc.util.ReadAheadInputStream) at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3014) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3467) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719) - locked <54dd815d> (a com.mysql.jdbc.JDBC4Connection) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155) - locked <54dd815d> (a com.mysql.jdbc.JDBC4Connection) at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1379) - locked <54dd815d> (a com.mysql.jdbc.JDBC4Connection) at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:56) at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:70) at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:57) at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:267) at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:141) at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:105) at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:81) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:101) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:95) at sun.reflect.GeneratedMethodAccessor215.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:355) at com.sun.proxy.$Proxy21.selectList(Unknown Source) at org.mybatis.spring.SqlSessionTemplate.selectList(SqlSessionTemplate.java:195) at org.apache.ibatis.binding.MapperMethod.executeForMany(MapperMethod.java:124) at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:90) at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:40) at com.sun.proxy.$Proxy32.listActivityWalletDTO(Unknown Source) at com.nfsq.xs.claim.service.budget.ActivityWalletOperationService.listActivityWalletDTO(ActivityWalletOperationService.java:363) at com.nfsq.xs.claim.biz.budget.BudgetService.signContract(BudgetService.java:610) at com.nfsq.xs.claim.biz.budget.BudgetService.freezeBudget(BudgetService.java:95) - locked <7153f861> (a com.nfsq.xs.claim.biz.budget.BudgetService)
看到了线程176拿了锁后最后一步执行的操作
ActivityWalletOperationService.listActivityWalletDTO
去看了代码发现对应的 SQL:
select * from ACTIVITY_WALLET where source_id = '240' and source = '1';
这么简单的 SQL ,去数据看了数据量28万条,一看索引竟然只有 id 索引,source_id 查询走的全表扫描。找少炳来加了索引后,应用报错全部都没了,持续观察了两天问题解决。
最后在这里感谢@张少炳,@赵统国 @凡小梦 @马瑾 提供的帮助。