elasticsearch学习笔记高级篇(十五)——实战搜索推荐

准备数据

PUT _bulk
{"index": {"_index": "test_index", "_id": "1"}}
{"test_field": "hello world"}
{"index": {"_index": "test_index", "_id": "2"}}
{"test_field": "hello win"}
{"index": {"_index": "test_index", "_id": "3"}}
{"test_field": "hello dog"}
{"index": {"_index": "test_index", "_id": "4"}}
{"test_field": "hello cat"}

搜索推荐:match_phrase_prefix

match_phrase_prefix原理跟match_phrase类似,唯一的区别就是把最后一个term作为前缀去搜索。属于search time

以搜索hello w为例。

hello就会去进行match搜索,搜索对应的文档,而w会作为前缀去扫描整个倒排索引,找到所有w开头的文档,然后,找到所有文档中,既包含hello,又包含w开头的字符的文档。
最后在这些文档中根据你的slop去计算,看在slop的范围内能不能让hello和w正好跟文档中的hello和w开头的单词的position匹配。
搜索代码如下:

GET /test_index/_search
{
  "query": {
    "match_phrase_prefix": {
      "test_field": "hello w"
    }
  }
}

输出结果:

{
  "took" : 40,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 2.5133061,
    "hits" : [
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 2.5133061,
        "_source" : {
          "test_field" : "hello world"
        }
      },
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 2.5133061,
        "_source" : {
          "test_field" : "hello win"
        }
      }
    ]
  }
}

搜索推荐:ngram

ngram做搜索推荐与前缀匹配一个很大的区别是ngram是属于index time的,在index的时候就将此进行拆分,比如world就会拆分成w、o、r、l、d。但是搜索的本质与match_phrase_prefix是一样的。

以搜索hello w为例。

hello就会去进行match搜索,搜索对应的文档,而w会作为前缀去扫描整个倒排索引,找到所有w开头的文档,然后,找到所有文档中,既包含hello,又包含w开头的字符的文档。
最后在这些文档中根据你的slop去计算,看在slop的范围内能不能让hello和w正好跟文档中的hello和w开头的单词的position匹配。

建立索引:

PUT test_index
{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": { 
            "type":     "edge_ngram",
            "min_gram": 1,
            "max_gram": 20
        }
      },
      "analyzer": {
        "autocomplete": {
            "type":      "custom",
            "tokenizer": "standard",
            "filter": [
                "lowercase",
                "autocomplete_filter" 
            ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "test_field": {
          "type":     "text",
          "analyzer": "autocomplete",
          "search_analyzer": "standard"
      }
    }
  }
}

插入数据

PUT _bulk
{"index": {"_index": "test_index", "_id": "1"}}
{"test_field": "hello world"}
{"index": {"_index": "test_index", "_id": "2"}}
{"test_field": "hello win"}
{"index": {"_index": "test_index", "_id": "3"}}
{"test_field": "hello dog"}
{"index": {"_index": "test_index", "_id": "4"}}
{"test_field": "hello cat"}

查询

GET /test_index/_search
{
  "query": {
    "match_phrase": {
      "test_field": "hello w"
    }
  }
}

输出:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.1620307,
    "hits" : [
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.1620307,
        "_source" : {
          "test_field" : "hello world"
        }
      },
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.1620307,
        "_source" : {
          "test_field" : "hello win"
        }
      }
    ]
  }
}

相关推荐