高性能mongodb之应用程序跑执行计划

hanyueqi

2019-07-01

执行计划

之前发了一篇关于mongodb执行计划的说明。利用执行计划，我们可以判断每一次sql的执行情况和mongodb给出的执行建议。在mongo shell中跑执行计划的命令，举个例子：

db.collecitonName.find({}).explain("queryPlanner")

执行计划的模式为三种：queryPlanner executionStats allPlansExecution。第一种不会真正跑命令本身，只有响应命令分析后的报告。上面例子的响应结果就是对 db.collecitonName.find({}) 这个查询语句的分析。

程序中跑执行计划

我使用的是java, mongodb库用的是mongodb-java-driver。mongodb-java-driver的API提供了两种方式去跑执行计划：

方式一：

MongoClient mongoClient = new MongoClient(new ServerAddress(host, port));
mongoClient.getDB("xxx").getCollection("yyy").find(quert).explain();

这是一个便捷的方式。这种方式会真正执行命令，也就是说它使用的是executionStats模式。响应结果会有执行时间、扫描记录数等真实的执行情况。如果你的程序想要在命令执行前做一个预判，这个API不是你想要的。

方式二：

API没有提供queryPlanner的方式。我花了一些时间去搜索资料，发现网上没有跑queryPlanner的需求，至少我是没有找到类似的发问和使用例子。纠结了一会儿，最终发现库里有这样一个api， mongoClient.getDB("xxx").command(BasicDBObject command)，支持程序传入一个命令。最后在官方文档里找到了这样一个说明：

explain

New in version 3.0.

The explain command provides information on the execution of the following commands: aggregate, count, distinct, group, find, findAndModify, delete, and update.

Although MongoDB provides the explain command, the preferred method for running explain is to use the db.collection.explain() and cursor.explain() helpers.

The explain command has the following syntax:

语法如下：

{
   explain: <command>,
   verbosity: <string>
}

explain: <command>。 支持  aggregate, count, distinct, group, find, findAndModify, delete, and update等等的命令。
verbosity: <string>。支持模式"queryPlanner" 、"executionStats"  、"allPlansExecution" (Default)

跟踪find进去，find支持的字段如下，应有尽有。

{
   "find": <string>,
   "filter": <document>,
   "sort": <document>,
   "projection": <document>,
   "hint": <document or string>,
   "skip": <int>,
   "limit": <int>,
   "batchSize": <int>,
   "singleBatch": <bool>,
   "comment": <string>,
   "maxScan": <int>,   // Deprecated in MongoDB 4.0
   "maxTimeMS": <int>,
   "readConcern": <document>,
   "max": <document>,
   "min": <document>,
   "returnKey": <bool>,
   "showRecordId": <bool>,
   "tailable": <bool>,
   "oplogReplay": <bool>,
   "noCursorTimeout": <bool>,
   "awaitData": <bool>,
   "allowPartialResults": <bool>,
   "collation": <document>
}

通过阅读文档，跑queryPlanner模式的执行计划应该是这样的：

//查询某个集合，queryCondition是查询条件。

MongoClient mongoClient = MongoUtil.getConnection(mongodb.getHost(), mongodb.getPort(), "", "", mongodb.getDb());
BasicDBObject command = new BasicDBObject();
BasicDBObject find = new BasicDBObject();
find.put("find", "集合名");
find.put("filter", queryCondition);//查询条件，是一个BasicDBObject
command.put("explain", find);
command.put("verbosity", "queryPlanner");
CommandResult explainResult = mongoClient.getDB(mongodb.getDb()).command(command);

python程序中跑执行计划遇到的坑

使用 pymongo库

import json
import pymongo

if __name__ == '__main__':
    client = pymongo.MongoClient(host='127.0.0.1', port=27017)
    #指定一个db
    db = client.get_database(name='datanamexxx')

    command = {}
    explain = {}
    #要操作的集合
    explain['find'] = "collectionnamexxx"
    #查询的条件
    explain['filter'] = {"col1":"202060056"}
    verbosity = "executionStats"
    command['explain'] = explain
    command['verbosity'] = verbosity
    print json.dumps(db.command(command=command))

以上程序是有问题的，不能达到想要的目的(一次查询的执行情况)。后来经过查阅mongo文档和尝试，明确是使用方式不正确导致的。
错误原因：mongo的command要求参数是有序的，因为首参数是命令名。正如上面的find命令：

{
   "find": <string>, #命令名
   "filter": <document>,
   "sort": <document>,
   "projection": <document>,
   "hint": <document or string>,
   "skip": <int>,
   "limit": <int>,
   "batchSize": <int>,
   "singleBatch": <bool>,
    ...

mongo驱动在处理命令时首先要知道执行哪个命令，然而 python的dict或者的java的map再或者所有的map数据结构都是无序的。我们需要一个记录参数的顺序，使用者需要把首参数设置在最前面。我们来看看驱动的源码，原理其实是对dict封装一层，添加一个list来记录参数顺序：

#继承dict
class SON(dict):
    def __init__(self, data=None, **kwargs):
        #__keys就是记录参数顺序的列表
        self.__keys = []
        dict.__init__(self)
        self.update(data)
        self.update(kwargs)
    #省略...
    #打印时，按__keys的顺序拼字符串，合理
    def __repr__(self):
        result = []
        for key in self.__keys:
            result.append("(%r, %r)" % (key, self[key]))
        return "SON([%s])" % ", ".join(result)

    #设置元素时，先把key按顺序保存下来
    def __setitem__(self, key, value):
        if key not in self.__keys:
            self.__keys.append(key)
        dict.__setitem__(self, key, value)

    def __delitem__(self, key):
        self.__keys.remove(key)
        dict.__delitem__(self, key)

    #省略...

pymongo正确的使用方式

import json
import pymongo

if __name__ == '__main__':
    client = pymongo.MongoClient(host='127.0.0.1', port=27017)
    #指定一个db
    db = client.get_database(name='datanamexxx')
    
    #注意顺序
    explainSon = SON([("find", 'collectionnamexxx'),
               ("filter", {"uid": "202060056"})])
    cmd = SON([("explain", explainSon),
               ("verbosity", "queryPlanner")])
    print json.dumps(db.command(cmd))

mongodb

安科网

高性能mongodb之应用程序跑执行计划

hanyueqi

执行计划

程序中跑执行计划

方式一：

方式二：

python程序中跑执行计划遇到的坑

pymongo正确的使用方式

hanyueqi

相关推荐

分布式文档存储数据库之MongoDB访问控制的操作方法

分布式文档存储数据库之MongoDB备份与恢复的实践详解

Pycharm连接MongoDB数据库安装教程详解

分布式文档存储数据库之MongoDB分片集群的问题

MongoDB数据库用户角色和权限管理详解

利用golang驱动操作MongoDB数据库的步骤

ubuntu安装mongodb创建账号和库及添加坐标索引的流程分析

MongoDB查询之高级操作详解（多条件查询、正则匹配查询等）

SpringBoot+MongoDB实现物流订单系统的代码

MongoDb CPU利用率过高问题如何解决

flask_16：通过 MongoEngine 使用 MongoDB

MongoDB通配符索引的用法实例

MongoDB 用户管理

MongoDB如何查看版本信息详解

Centos7 yum安装mongodb实现步骤详解

Docker 搭建集群MongoDB的实现步骤

mongodb的聚合操作

mongodb的java客户端

mongodb的java客户端

mongodb的java客户端

hanyueqi