ElasticSearch 学习笔记 - 8. 查询

csmnjk

2019-06-28

1、概念

映射（Mapping）
描述数据在每个字段内如何存储
分析（Analysis）
全文是如何处理使之可以被搜索的
领域特定查询语言（Query DSL）
Elasticsearch 中强大灵活的查询语言

2、空搜索

GET /_search

{
   "hits" : {
      "total" :       14,
      "hits" : [
        {
          "_index":   "us",
          "_type":    "tweet",
          "_id":      "7",
          "_score":   1,
          "_source": {
             "date":    "2014-09-17",
             "name":    "John Smith",
             "tweet":   "The Query DSL is really powerful and flexible",
             "user_id": 2
          }
       },
        ... 9 RESULTS REMOVED ...
      ],
      "max_score" :   1
   },
   "took" :           4,
   "_shards" : {
      "failed" :      0,
      "successful" :  10,
      "total" :       10
   },
   "timed_out" :      false
}

hits

返回结果中最重要的部分是 hits ，它包含 total 字段来表示匹配到的文档总数，并且一个 hits 数组包含所查询结果的前十个文档。

max_score 值是与查询所匹配文档的 _score 的最大值。

took

took 值告诉我们执行整个搜索请求耗费了多少毫秒。

shards

_shards 部分告诉我们在查询中参与分片的总数，以及这些分片成功了多少个失败了多少个。正常情况下我们不希望分片失败，但是分片失败是可能发生的。如果我们遭遇到一种灾难级别的故障，在这个故障中丢失了相同分片的原始数据和副本，那么对这个分片将没有可用副本来对搜索请求作出响应。假若这样，Elasticsearch 将报告这个分片是失败的，但是会继续返回剩余分片的结果。

3、多索引，多类型

/_search

在所有的索引中搜索所有的类型

/gb/_search

在 gb 索引中搜索所有的类型

/gb,us/_search

在 gb 和 us 索引中搜索所有的文档

/g,u/_search

在任何以 g 或者 u 开头的索引中搜索所有的类型

/gb/user/_search

在 gb 索引中搜索 user 类型

/gb,us/user,tweet/_search

在 gb 和 us 索引中搜索 user 和 tweet 类型

/_all/user,tweet/_search

在所有的索引中搜索 user 和 tweet 类型

4、分页

和 SQL 使用 LIMIT 关键字返回单个 page 结果的方法相同
Elasticsearch 接受 from 和 size 参数：

size

显示应该返回的结果数量，默认是 10

from

显示应该跳过的初始结果数量，默认是 0

GET /_search?size=5&from=5

5、请求体查询

空查询

GET /_search
{} 

GET /index_2014*/type1,type2/_search
{}

GET /_search
{
  "from": 30,
  "size": 10
}

查询表达式

GET /_search
{
    "query": YOUR_QUERY_HERE
}

举个例子，你可以使用 match 查询语句来查询 tweet 字段中包含 elasticsearch 的 tweet：

GET /_search
{
    "query": {
        "match": {
            "tweet": "elasticsearch"
        }
    }
}

合并查询

{
    "bool": {
        "must": { "match":   { "email": "business opportunity" }},
        "should": [
            { "match":       { "starred": true }},
            { "bool": {
                "must":      { "match": { "folder": "inbox" }},
                "must_not":  { "match": { "spam": true }}
            }}
        ],
        "minimum_should_match": 1
    }
}

最重要的查询

match_all查询

match_all 查询简单的 匹配所有文档。在没有指定查询方式时，它是默认的查询：

{ "match_all": {}}

match 查询

高级别全文检索通常用于在全文本字段（如电子邮件正文）上运行全文检索。
他们了解如何分析被查询的字段，并在执行之前将每个字段的分析器（或search_analyzer）应用于查询字符串。

就是说查询之前会对查询的字符串先做分词处理

{ "match": { "tweet": "About Search" }}


match 的operator 操作。必须同时满足 centos 、升、级

GET website/_search
{
  "query": {
    "match": {
        "title":{
          "query":"centos升级",
          "operator":"and"
        }
    }
  }
}

multi_match 查询

multi_match 查询可以在多个字段上执行相同的 match 查询：

{
    "multi_match": {
        "query":    "full text search",
        "fields":   [ "title", "body" ]
    }
}

match_phrase查询（短语查询）

match_phrase查询会将查询内容分词，分词器可以自定义，文档中同时满足以下两个条件才会被检索到：

分词后所有词项都要出现在该字段中
字段中的词项顺序要一致

（1）、创建索引插入数据

PUT test

PUT test/hello/1
{ "content":"World Hello"}

PUT test/hello/2
{ "content":"Hello World"}

PUT test/hello/3
{ "content":"I just said hello world"}

（2）、使用match_phrase查询”hello world”

GET test/_search
{
  "query": {
    "match_phrase": {
      "content": "hello world"
    }
  }
}

上面后两个文档匹配，被检索出来；第1个文档的词序与被查询内容不一致，所以不匹配。
{
  "took": 21,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "test",
        "_type": "hello",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "content": "Hello World"
        }
      },
      {
        "_index": "test",
        "_type": "hello",
        "_id": "3",
        "_score": 0.5753642,
        "_source": {
          "content": "I just said hello world"
        }
      }
    ]
  }
}

range 查询

range 查询找出那些落在指定区间内的数字或者时间：

gt 大于
gte 大于等于
lt 小于
lte 小于等于

{
    "range": {
        "age": {
            "gte":  20,
            "lt":   30
        }
    }
}

term 查询

term 查询被用于精确值 匹配，这些精确值可能是数字、时间、布尔或者那些 not_analyzed 的字符串：
term 查询对于输入的文本不分析，所以它将给定的值进行精确查询。

{ "term": { "age":    26           }}
{ "term": { "date":   "2014-09-01" }}
{ "term": { "public": true         }}
{ "term": { "tag":    "full_text"  }}

terms 查询

terms 查询和 term 查询一样，但它允许你指定多值进行匹配。如果这个字段包含了指定值中的任何一个值，那么这个文档满足条件：

{ "terms": { "tag": [ "search", "full_text", "nosql" ] }}

exists 查询和 missing 查询

exists 查询和 missing 查询被用于查找那些指定字段中有值 (exists) 或无值 (missing) 的文档。
这与SQL中的 IS_NULL (missing) 和 NOT IS_NULL (exists) 在本质上具有共性：

{
    "exists":   {
        "field":    "title"
    }
}

组合查询

bool 查询来实现你的需求。这种查询将多查询组合在一起，接收一下的参数

must
文档 必须匹配这些条件才能被包含进来。
must_not
文档 必须不匹配这些条件才能被包含进来。
should
如果满足这些语句中的任意语句，将增加 _score ，否则，无任何影响。它们主要用于修正每个文档的相关性得分。
filter
必须匹配，但它以不评分、过滤模式来进行。这些语句对评分没有贡献，只是根据过滤标准来排除或包含文档。

下面的查询用于查找 title 字段匹配 how to make millions
并且不被标识为 spam 的文档。
那些被标识为 starred 或在2014之后的文档，将比另外那些文档拥有更高的排名。
如果两者都满足，那么它排名将更高：

{
    "bool": {
        "must":     { "match": { "title": "how to make millions" }},
        "must_not": { "match": { "tag":   "spam" }},
        "should": [
            { "match": { "tag": "starred" }},
            { "range": { "date": { "gte": "2014-01-01" }}}
        ]
    }
}

增加filter查询

如果我们不想因为文档的时间而影响得分，可以用 filter 语句来重写前面的例子：

{
    "bool": {
        "must":     { "match": { "title": "how to make millions" }},
        "must_not": { "match": { "tag":   "spam" }},
        "should": [
            { "match": { "tag": "starred" }}
        ],
        "filter": {
          "range": { "date": { "gte": "2014-01-01" }} 
        }
    }
}

验证查询

GET /gb/tweet/_validate/query
{
   "query": {
      "tweet" : {
         "match" : "really powerful"
      }
   }
}

{
  "valid" :         false,
  "_shards" : {
    "total" :       1,
    "successful" :  1,
    "failed" :      0
  }
}

理解查询语句

GET /cars/transactions/_validate/query?explain
{
  "query": {
    "match": {
      "make": "toyota"
    }
  }  
}

{
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "valid": true,
  "explanations": [
    {
      "index": "cars",
      "valid": true,
      "explanation": "+make:toyota #*:*"
    }
  ]
}

elasticsearch ul tweet 索引