Nginx 的recv() failed 错误解决一例
L同学又一次把socket连接往错误的端口上连了。
搬完办公室,D同学说端口有点乱,一台开发机上3个游戏,端口范围不规范,于是就定了下各个游戏的端口区间,负载其中一款游戏的L同学就开始修改端口了。
修改完端口后游戏进不去了。
表现为:
前端一连接就报错
后端接受到了请求并处理了请求
我在输出处打了下log,输出的数据是对的
nginx的error log如下:
recv() failed (104: Connection reset by peer) while reading response header from upstream
当时没有去看php-fpm的log,其实应该先去检查下php-fpm的错误log,后来想起后去看了下
Nov 29 12:19:04.040453 [NOTICE] [pool www] child 29815 started
Nov 29 12:20:32.604920 [WARNING] [pool www] child 29785 exited on signal 11 (SIGSEGV) after 246.354264 seconds from start
Nov 29 12:20:32.605383 [NOTICE] [pool www] child 29817 started
Nov 29 12:20:35.668969 [WARNING] [pool www] child 29815 exited on signal 11 (SIGSEGV) after 91.628524 seconds from start
Nov 29 12:20:35.669312 [NOTICE] [pool www] child 29825 started
Nov 29 12:21:45.068408 [WARNING] [pool www] child 29825 exited on signal 11 (SIGSEGV) after 69.399102 seconds from start
Nov 29 12:21:45.068786 [NOTICE] [pool www] child 29836 started
Nov 29 13:00:56.132529 [WARNING] [pool www] child 29817 exited on signal 11 (SIGSEGV) after 2423.527159 seconds from start
Nov 29 13:00:56.132676 [NOTICE] [pool www] child 30046 started
Nov 29 13:01:54.055862 [WARNING] [pool www] child 29836 exited on signal 11 (SIGSEGV) after 2408.987081 seconds from start
Nov 29 13:01:54.056373 [NOTICE] [pool www] child 30060 started
Nov 29 13:02:02.142301 [WARNING] [pool www] child 30046 exited on signal 11 (SIGSEGV) after 66.009642 seconds from start
Nov 29 13:02:02.142795 [NOTICE] [pool www] child 30061 started
Nov 29 13:02:02.925481 [WARNING] [pool www] child 29652 exited on signal 11 (SIGSEGV) after 2750.991891 seconds from start
Nov 29 13:02:02.926067 [NOTICE] [pool www] child 30062 started
Nov 29 13:03:20.960769 [WARNING] [pool www] child 30062 exited on signal 11 (SIGSEGV) after 78.034715 seconds from start
Nov 29 13:03:20.961215 [NOTICE] [pool www] child 30091 started
php worker进程频繁地挂掉和拉起
解决办法:
修改php-fpm配置,只起一个worker进程
kill -USR2 php-fpm_master_pid 重启php-fpm
strace -p only_php_worker_pid
然后发现php worker进程在connect某个端口后就挂掉了
一问L同学这个端口是干嘛的,然后他恍然大悟,哦。。。。。。
原因是没有对redis的pconnect的返回值做判断,然后直接lpush,导致php worker进程直接core掉。