elasticsearch学习笔记高级篇(十五)——实战搜索推荐
准备数据
PUT _bulk {"index": {"_index": "test_index", "_id": "1"}} {"test_field": "hello world"} {"index": {"_index": "test_index", "_id": "2"}} {"test_field": "hello win"} {"index": {"_index": "test_index", "_id": "3"}} {"test_field": "hello dog"} {"index": {"_index": "test_index", "_id": "4"}} {"test_field": "hello cat"}
搜索推荐:match_phrase_prefix
match_phrase_prefix原理跟match_phrase类似,唯一的区别就是把最后一个term作为前缀去搜索。属于search time
以搜索hello w为例。
hello就会去进行match搜索,搜索对应的文档,而w会作为前缀去扫描整个倒排索引,找到所有w开头的文档,然后,找到所有文档中,既包含hello,又包含w开头的字符的文档。
最后在这些文档中根据你的slop去计算,看在slop的范围内能不能让hello和w正好跟文档中的hello和w开头的单词的position匹配。
搜索代码如下:
GET /test_index/_search { "query": { "match_phrase_prefix": { "test_field": "hello w" } } }
输出结果:
{ "took" : 40, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 2.5133061, "hits" : [ { "_index" : "test_index", "_type" : "_doc", "_id" : "1", "_score" : 2.5133061, "_source" : { "test_field" : "hello world" } }, { "_index" : "test_index", "_type" : "_doc", "_id" : "2", "_score" : 2.5133061, "_source" : { "test_field" : "hello win" } } ] } }
搜索推荐:ngram
ngram做搜索推荐与前缀匹配一个很大的区别是ngram是属于index time的,在index的时候就将此进行拆分,比如world就会拆分成w、o、r、l、d。但是搜索的本质与match_phrase_prefix是一样的。
以搜索hello w为例。
hello就会去进行match搜索,搜索对应的文档,而w会作为前缀去扫描整个倒排索引,找到所有w开头的文档,然后,找到所有文档中,既包含hello,又包含w开头的字符的文档。
最后在这些文档中根据你的slop去计算,看在slop的范围内能不能让hello和w正好跟文档中的hello和w开头的单词的position匹配。
建立索引:
PUT test_index { "settings": { "analysis": { "filter": { "autocomplete_filter": { "type": "edge_ngram", "min_gram": 1, "max_gram": 20 } }, "analyzer": { "autocomplete": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "autocomplete_filter" ] } } } }, "mappings": { "properties": { "test_field": { "type": "text", "analyzer": "autocomplete", "search_analyzer": "standard" } } } }
插入数据
PUT _bulk {"index": {"_index": "test_index", "_id": "1"}} {"test_field": "hello world"} {"index": {"_index": "test_index", "_id": "2"}} {"test_field": "hello win"} {"index": {"_index": "test_index", "_id": "3"}} {"test_field": "hello dog"} {"index": {"_index": "test_index", "_id": "4"}} {"test_field": "hello cat"}
查询
GET /test_index/_search { "query": { "match_phrase": { "test_field": "hello w" } } }
输出:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 1.1620307, "hits" : [ { "_index" : "test_index", "_type" : "_doc", "_id" : "1", "_score" : 1.1620307, "_source" : { "test_field" : "hello world" } }, { "_index" : "test_index", "_type" : "_doc", "_id" : "2", "_score" : 1.1620307, "_source" : { "test_field" : "hello win" } } ] } }
相关推荐
newbornzhao 2020-09-14
做对一件事很重要 2020-09-07
renjinlong 2020-09-03
明瞳 2020-08-19
李玉志 2020-08-19
mengyue 2020-08-07
molong0 2020-08-06
AFei00 2020-08-03
molong0 2020-08-03
wenwentana 2020-08-03
YYDU 2020-08-03
另外一部分,则需要先做聚类、分类处理,将聚合出的分类结果存入ES集群的聚类索引中。数据处理层的聚合结果存入ES中的指定索引,同时将每个聚合主题相关的数据存入每个document下面的某个field下。
sifeimeng 2020-08-03
心丨悦 2020-08-03
liangwenrong 2020-07-31
sifeimeng 2020-08-01
mengyue 2020-07-30
tigercn 2020-07-29
IceStreamLab 2020-07-29