Mina的WebSocket问题排查
项目的客户端需要重构,探讨后决定客户端与服务器的交互使用WebSocket协议。后端的网络层框架是Mina,在现有基础上增加对WebSocket的解析即可完成无缝迁移。正好Apache讨论组上有人提供Mina的WebSocket过滤器代码。下载后添加到项目,扔到服务器上一切正常。
今天前端同事反馈说请求某个接口会导致客户端立马报错然后断开连接,提示信息为:One or more reserved bits are on, reserved1 = 0, reserved2 = 1, reserved3 = 1。看到信息后,第一反应是:数据没封包就发送到客户端了?排查log,确认数据按照WebSocket的格式封装后再发送出去的。
根据错误信息上网查解决方案,基本上甩锅给杀毒软件、系统、浏览器等,没找到靠谱的原因和解决方法。没办法只好让前端开wireshark抓包,与服务端数据进行对比。一番折腾后,确认是因为接口返回大量数据,客户端没能完整接收造成的。
数据量大才会出现,难道WebSocket客户端的缓冲区或者单次传输有大小限制?上网搜寻后发现都没有限制,很有可能是服务端没能正确封包数据造成的。由于使用的是第三方非权威代码对数据封包,过一遍编码的代码是必要的:
1 /* 2 * To change this template, choose Tools | Templates 3 * and open the template in the editor. 4 */ 5 package com.shephertz.appwarp.websocket.binary; 6 7 import org.apache.mina.core.buffer.IoBuffer; 8 import org.apache.mina.core.session.IoSession; 9 import org.apache.mina.filter.codec.ProtocolEncoderAdapter; 10 import org.apache.mina.filter.codec.ProtocolEncoderOutput; 11 12 /** 13 * Encodes incoming buffers in a manner that makes the receiving client type transparent to the 14 * encoders further up in the filter chain. If the receiving client is a native client then 15 * the buffer contents are simply passed through. If the receiving client is a websocket, it will encode 16 * the buffer contents in to WebSocket DataFrame before passing it along the filter chain. 17 * 18 * Note: you must wrap the IoBuffer you want to send around a WebSocketCodecPacket instance. 19 * 20 * @author DHRUV CHOPRA 21 */ 22 public class WebSocketEncoder extends ProtocolEncoderAdapter{ 23 24 @Override 25 public void encode(IoSession session, Object message, ProtocolEncoderOutput out) throws Exception { 26 boolean isHandshakeResponse = message instanceof WebSocketHandShakeResponse; 27 boolean isDataFramePacket = message instanceof WebSocketCodecPacket; 28 boolean isRemoteWebSocket = session.containsAttribute(WebSocketUtils.SessionAttribute) && (true==(Boolean)session.getAttribute(WebSocketUtils.SessionAttribute)); 29 IoBuffer resultBuffer; 30 if(isHandshakeResponse){ 31 WebSocketHandShakeResponse response = (WebSocketHandShakeResponse)message; 32 resultBuffer = WebSocketEncoder.buildWSResponseBuffer(response); 33 } 34 else if(isDataFramePacket){ 35 WebSocketCodecPacket packet = (WebSocketCodecPacket)message; 36 resultBuffer = isRemoteWebSocket ? WebSocketEncoder.buildWSDataFrameBuffer(packet.getPacket()) : packet.getPacket(); 37 } 38 else{ 39 throw (new Exception("message not a websocket type")); 40 } 41 42 out.write(resultBuffer); 43 } 44 45 // Web Socket handshake response go as a plain string. 46 private static IoBuffer buildWSResponseBuffer(WebSocketHandShakeResponse response) { 47 IoBuffer buffer = IoBuffer.allocate(response.getResponse().getBytes().length, false); 48 buffer.setAutoExpand(true); 49 buffer.put(response.getResponse().getBytes()); 50 buffer.flip(); 51 return buffer; 52 } 53 54 // Encode the in buffer according to the Section 5.2. RFC 6455 55 private static IoBuffer buildWSDataFrameBuffer(IoBuffer buf) { 56 57 IoBuffer buffer = IoBuffer.allocate(buf.limit() + 2, false); 58 buffer.setAutoExpand(true); 59 buffer.put((byte) 0x82); 60 if(buffer.capacity() <= 125){ 61 byte capacity = (byte) (buf.limit()); 62 buffer.put(capacity); 63 } 64 else{ 65 buffer.put((byte)126); 66 buffer.putShort((short)buf.limit()); 67 } 68 buffer.put(buf); 69 buffer.flip(); 70 return buffer; 71 } 72 }
代码的51到71行对数据进行封包,直觉提示60行到67行的封包处理在数据量大时有问题。打开WebSocket的rfc页面,查看规范中定义的数据包结构。根据规范,数据包小于125时,只需增加两个字节的包头;大于等于126小于等于2^16时,包头有4个字节;数据长度更大时,包头有10个字节,其中8个字节用来存储数据的长度。
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-------+-+-------------+-------------------------------+ |F|R|R|R| opcode|M| Payload len | Extended payload length | |I|S|S|S| (4) |A| (7) | (16/64) | |N|V|V|V| |S| | (if payload len==126/127) | | |1|2|3| |K| | | +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - + | Extended payload length continued, if payload len == 127 | + - - - - - - - - - - - - - - - +-------------------------------+ | |Masking-key, if MASK set to 1 | +-------------------------------+-------------------------------+ | Masking-key (continued) | Payload Data | +-------------------------------- - - - - - - - - - - - - - - - + : Payload Data continued ... : + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | Payload Data continued ... | +---------------------------------------------------------------+
下载的第三方代码只处理了payload len<=126的情形,数据量大时payload == 127的情形仍按照payload len == 126方式处理,显然会导致客户端接收不完整数据。将构建WebSocket数据帧的函数更改如下,顺利解决问题:
// Encode the in buffer according to the Section 5.2. RFC 6455 private static IoBuffer buildWSDataFrameBuffer(IoBuffer buf) { IoBuffer buffer = null; if (buf.limit() <= 125) { buffer = IoBuffer.allocate(buf.limit() + 2, false); buffer.put((byte) 0x82); buffer.put((byte)buf.limit()); } else if (buf.limit() <= 0xFFFF) { buffer = IoBuffer.allocate(buf.limit() + 4, false); buffer.put((byte) 0x82); buffer.put((byte)126); buffer.putShort((short)buf.limit()); } else { buffer = IoBuffer.allocate(buf.limit() + 10, false); buffer.put((byte) 0x82); buffer.put((byte)127); buffer.putLong((long)buf.limit()); } buffer.put(buf); buffer.flip(); return buffer; }
主要改动两点:1. 新增数据大于2^16时的处理; 2. 分配大小合适的缓冲区,避免触发重分配。
总结
根据规范,类似"One or more reserved bits are on"的错误提示,是服务端没有正确封包造成的。另外在github上搜websocket解析库,发现 cjoo/WebSocket_mina 这个库中的封包处理也是错误的。错误的地方在 https://github.com/cjoo/WebSo... 中的70行到72行。