Elasticsearch 的搜索方法

腊八粥

2019-06-26

搜索数据建立

ElasticSearch最诱人的地方即是为我们提供了方便快捷的搜索功能，我们首先尝试使用如下的命令创建测试文档:

curl -XPUT "http://localhost:9200/movies/movie/1" -d'
{
    "title": "The Godfather",
    "director": "Francis Ford Coppola",
    "year": 1972,
    "genres": ["Crime", "Drama"]
}'

curl -XPUT "http://localhost:9200/movies/movie/2" -d'
{
    "title": "Lawrence of Arabia",
    "director": "David Lean",
    "year": 1962,
    "genres": ["Adventure", "Biography", "Drama"]
}'

curl -XPUT "http://localhost:9200/movies/movie/3" -d'
{
    "title": "To Kill a Mockingbird",
    "director": "Robert Mulligan",
    "year": 1962,
    "genres": ["Crime", "Drama", "Mystery"]
}'

curl -XPUT "http://localhost:9200/movies/movie/4" -d'
{
    "title": "Apocalypse Now",
    "director": "Francis Ford Coppola",
    "year": 1979,
    "genres": ["Drama", "War"]
}'

curl -XPUT "http://localhost:9200/movies/movie/5" -d'
{
    "title": "Kill Bill: Vol. 1",
    "director": "Quentin Tarantino",
    "year": 2003,
    "genres": ["Action", "Crime", "Thriller"]
}'

curl -XPUT "http://localhost:9200/movies/movie/6" -d'
{
    "title": "The Assassination of Jesse James by the Coward Robert Ford",
    "director": "Andrew Dominik",
    "year": 2007,
    "genres": ["Biography", "Crime", "Drama"]
}'

这里需要了解的是，ElasticSearch为我们提供了通用的_bulk端点来在单请求中完成多文档创建操作，不过这里为了简单起见还是分为了多个请求进行执行。

ElasticSearch中搜索主要是基于_search这个端点进行的，其标准请求格式为:<index>/<type>/_search</type></index>，其中index与type都是可选的。
换言之，我们可以以如下几种方式发起请求:

http://localhost:9200/_search... - 搜索所有的Index与Type
http://localhost:9200/movies/... - 搜索Movies索引下的所有类型
http://localhost:9200/movies/movie... -仅搜索包含在Movies索引Movie类型下的文档

响应内容会包含文档的元信息，文档的原始数据存在 _source 字段中。

检索某个文档
我们也可以直接检索出文档的 _source 字段，如下：

curl -XGET 'http://localhost:9200/movies/movie/1/_source'

返回的结果：

{
    "title": "The Godfather",
    "director": "Francis Ford Coppola",
    "year": 1972,
    "genres": ["Crime", "Drama"]
}

检索所有文档
我们可以使用 _search 这个 API 检索出所有的文档，命令如下：

curl -XGET 'http://localhost:9200/movies/movie/_search'

返回的结果：

{
    "took": 5,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 6,
        "max_score": 1,
        "hits": [
            {
                "_index": "movies",
                "_type": "movie",
                "_id": "5",
                "_score": 1,
                "_source": {
                    "title": "Kill Bill: Vol. 1",
                    "director": "Quentin Tarantino",
                    "year": 2003,
                    "genres": [
                        "Action",
                        "Crime",
                        "Thriller"
                    ]
                }
            },
            {
                "_index": "movies",
                "_type": "movie",
                "_id": "2",
                "_score": 1,
                "_source": {
                    "title": "Lawrence of Arabia",
                    "director": "David Lean",
                    "year": 1962,
                    "genres": [
                        "Adventure",
                        "Biography",
                        "Drama"
                    ]
                }
            },
            {
                "_index": "movies",
                "_type": "movie",
                "_id": "4",
                "_score": 1,
                "_source": {
                    "title": "Apocalypse Now",
                    "director": "Francis Ford Coppola",
                    "year": 1979,
                    "genres": [
                        "Drama",
                        "War"
                    ]
                }
            },
            {
                "_index": "movies",
                "_type": "movie",
                "_id": "6",
                "_score": 1,
                "_source": {
                    "title": "The Assassination of Jesse James by the Coward Robert Ford",
                    "director": "Andrew Dominik",
                    "year": 2007,
                    "genres": [
                        "Biography",
                        "Crime",
                        "Drama"
                    ]
                }
            },
            {
                "_index": "movies",
                "_type": "movie",
                "_id": "1",
                "_score": 1,
                "_source": {
                    "title": "The Godfather",
                    "director": "Francis Ford Coppola",
                    "year": 1972,
                    "genres": [
                        "Crime",
                        "Drama"
                    ]
                }
            },
            {
                "_index": "movies",
                "_type": "movie",
                "_id": "3",
                "_score": 1,
                "_source": {
                    "title": "To Kill a Mockingbird",
                    "director": "Robert Mulligan",
                    "year": 1962,
                    "genres": [
                        "Crime",
                        "Drama",
                        "Mystery"
                    ]
                }
            }
        ]
    }
}

可以看到，hits 这个 object 包含了 total，hits 数组等字段，其中，hits 数组包含了所有的文档，这里只有两个文档，total 表明了文档的数量，默认情况下会返回前 10 个结果。我们也可以设定 From/Size 参数来获取某一范围的文档，可参考这里，比如：

curl -XGET 'http://localhost:9200/movies/movie/_search?from=1&size=2'

返回的结果如下：

{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 6,
        "max_score": 1,
        "hits": [
            {
                "_index": "movies",
                "_type": "movie",
                "_id": "2",
                "_score": 1,
                "_source": {
                    "title": "Lawrence of Arabia",
                    "director": "David Lean",
                    "year": 1962,
                    "genres": [
                        "Adventure",
                        "Biography",
                        "Drama"
                    ]
                }
            },
            {
                "_index": "movies",
                "_type": "movie",
                "_id": "4",
                "_score": 1,
                "_source": {
                    "title": "Apocalypse Now",
                    "director": "Francis Ford Coppola",
                    "year": 1979,
                    "genres": [
                        "Drama",
                        "War"
                    ]
                }
            }
        ]
    }
}

检索某些字段

有时候，我们只需检索文档的个别字段，这时可以使用 _source 参数，多个字段可以使用逗号分隔，如下所示：

curl -XGET 'http://localhost:9200/movies/movie/1?_source=title,director'

返回的结果：

{
    "_index": "movies",
    "_type": "movie",
    "_id": "1",
    "_version": 1,
    "found": true,
    "_source": {
        "director": "Francis Ford Coppola",
        "title": "The Godfather"
    }
}

query string 搜索
query string 搜索以 q=field:value 的形式进行查询，比如查询 title 字段含有 godfather 的电影：

curl -XGET 'http://localhost:9200/movies/movie/_search?q=title:godfather'

返回的结果：

{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.25811607,
        "hits": [
            {
                "_index": "movies",
                "_type": "movie",
                "_id": "1",
                "_score": 0.25811607,
                "_source": {
                    "title": "The Godfather",
                    "director": "Francis Ford Coppola",
                    "year": 1972,
                    "genres": [
                        "Crime",
                        "Drama"
                    ]
                }
            }
        ]
    }
}

DSL 搜索
上面的 query string 搜索比较轻量级，只适用于简单的场合。Elasticsearch 提供了更为强大的 DSL（Domain Specific Language）查询语言，适用于复杂的搜索场景，比如全文搜索。我们可以将上面的 query string 搜索转换为 DSL 搜索，如下：

GET /movies/movie/_search
{
    "query" : {
        "match" : {
            "title" : "godfather"
        }
    }
}

使用 curl请求：

curl -X GET "127.0.0.1:9200/movies/movie/_search" -d '{"query": {"match": {"title": "godfather"}}}'

最简单的查询请求即是全文检索，譬如我们这里需要搜索关键字:godfather:

搜索包含“godfather”的关键字：

curl -XPOST "http://localhost:9200/_search" -d'
{
    "query": {
        "query_string": {
            "query": "godfather",
        }
    }
}'

在title中搜索包含“godfather”的关键字

curl -XPOST "http://localhost:9200/_search" -d'
{
    "query": {
        "query_string": {
            "query": "godfather",
            "fields": ["title"]
        }
    }
}'

返回的结果：

{
    "took": 24,
    "timed_out": false,
    "_shards": {
        "total": 25,
        "successful": 25,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.25811607,
        "hits": [
            {
                "_index": "movies",
                "_type": "movie",
                "_id": "1",
                "_score": 0.25811607,
                "_source": {
                    "title": "The Godfather",
                    "director": "Francis Ford Coppola",
                    "year": 1972,
                    "genres": [
                        "Crime",
                        "Drama"
                    ]
                }
            }
        ]
    }
}

检查文档是否存在
如果你想做的只是检查文档是否存在——你对内容完全不感兴趣——使用HEAD方法来代替GET。HEAD请求不会返回响应体，只有HTTP头：

curl -i -XHEAD "http://localhost:9200/movies/movie/3"

Elasticsearch将会返回200 OK状态如果你的文档存在：

HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 255

如果不存在返回404 Not Found：

curl -i -XHEAD "http://localhost:9200/movies/movie/36"

HTTP/1.1 404 Not Found
content-type: application/json; charset=UTF-8
content-length: 60

当然，这只表示你在查询的那一刻文档不存在，但并不表示几毫秒后依旧不存在。另一个进程在这期间可能创建新文档。

参考：
ElasticSearch 2.x 入门与快速实践
 Elasticsearch 入门使用

elasticsearch movies localhost curl

安科网

Elasticsearch 的搜索方法

腊八粥

腊八粥

相关推荐

ElasticSearch最全详细使用教程

elasticsearch常用命令

Elasticsearch py客户端库安装及使用方法解析

十张图说清Elasticsearch原理！

ElasticSearch 交互使用

django 对接elasticsearch实现全文检索

Spring Boot 集成 Elasticsearch 实战

如何对 ElasticSearch 集群进行压力测试

操作ElasticSearch插件和可视化工具 Kibana

Elasticsearch实战 | match_phrase搜不出来，怎么办？

Elasticsearch聚合后分页深入详解

Elasticsearch大文件检索性能提升20倍实践（干货）

重磅 | 死磕Elasticsearch方法论认知清单（国庆更新版）

Elasticsearch实战 | 必要的时候，还得空间换时间!

Elasticsearch索引增量统计及定时邮件实现

如何在Linux下安装部署分布式全文搜索引擎

ElasticSearch的下载、安装使用

我也是才知道ElasticSearch条件更新是这么玩的

读写成功率达99.999%，提升ElasticSearch系统稳定性的秘密

es快照备份到minio

腊八粥