solr实现结果分组、字段折叠

lhc0

2014-06-03

solr实现结果分组、字段折叠

引言

字段合并和结果分组是同样的Solr特征的不同的方式思考。

字段合并是将一组结果相同的field合并，，例如：大多数搜索引擎如谷歌合并后只有一个或两项显示，随着一个链接点击看看从网站更多的结果。合并也可以用来抑制重复的文件。

结果分组是使用一个共同field值分组document，返回顶部的document组，顶部的document是基于分组的document. 一个例子是一个搜索在百思买的常用术语如dvd，显示前3个结果的每个类别（“电视和视频”，“电影”，“计算机”，等）

快速启动

如果你还没有准备好，请先下载solr相关文件，然后参考【solr入门.doc】完成搭建。

现在开启结果分组并且请求一个查询，我们第一次尝试在制造商名称分组（manu_exact field）

你现在只能在单值的域组！

...&q=solr+memory&group=true&group.field=manu_exact

http://192.168.2.89:8080/solr/collection1/select?q=*%3A*&wt=xml&indent=true&group=true&group.field=KR_UID

Group分组返回的结果是：

[...]

"grouped":{

"manu_exact":{

"matches":6,

"groups":[{

"groupValue":"Apache Software Foundation",

"doclist":{"numFound":1,"start":0,"docs":[

{

"id":"SOLR1000",

"name":"Solr, the Enterprise Search Server"}]

}},

{

"groupValue":"Corsair Microsystems Inc.",

"doclist":{"numFound":2,"start":0,"docs":[

{

"id":"VS1GB400C3",

"name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail"}]

}},

{

"groupValue":"A-DATA Technology Inc.",

"doclist":{"numFound":1,"start":0,"docs":[

{

"id":"VDBDB1A16",

"name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM"}]

}},

{

"groupValue":"Canon Inc.",

"doclist":{"numFound":1,"start":0,"docs":[

{

"id":"0579B002",

"name":"Canon PIXMA MP500 All-In-One Photo Printer"}]

}},

{

"groupValue":"ASUS Computer Inc.",

"doclist":{"numFound":1,"start":0,"docs":[

{

"id":"EN7800GTX/2DHTV/256M",

"name":"ASUS Extreme N7800GTX/2DHTV (256 MB)"}]

}}]}}

response 表明有6条匹配我们的结果，为每一个独特的group.field值，一个得分最高的文档doclist返回。该doclist也返回该组中的总的匹配数为“numfound”。该group本身也按最高的文档的得分在每一组显示。

我们可以找到最高分值的的document，同时匹配任意查询与group.query命令(像facet.query)。例如：我们可以利用这一结果查询前3名的document在不同的价格范围内：

...&q=memory&group=true&group.query=price:[0 TO 99.99]&group.query=price:[100 TO *]&group.limit=3

[...]

  "grouped":{

    "price:[0 TO 99.99]":{

      "matches":5,

      "doclist":{"numFound":1,"start":0,"docs":[

            "name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail",

            "price":74.99}]

}},

    "price:[100 TO *]":{

      "matches":5,

      "doclist":{"numFound":3,"start":0,"docs":[

            "name":"Canon PIXMA MP500 All-In-One Photo Printer",

            "price":179.99},

            "name":"CORSAIR  XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail",

            "price":185.0},

            "name":"ASUS Extreme N7800GTX/2DHTV (256 MB)",

            "price":479.95}]

}}

[...]

从上面的反应,通过查询“memory”可以返回5条 document。当然，1的价格低于100美元，3有100美元以上的价格。总计不达5因为一个document被不存在的价格，因此不匹配group.query。

我们可以使用的一组命令展现”main result”,通过添加参数group.main=true，虽然这一结果格式不拥有尽可能多的信息，它可以为现有的Solr客户端更容易解析。

...&q=solr+memory&group=true&group.field=manu_exact&group.main=true

  "response":{"numFound":6,"start":0,"docs":[

        "id":"SOLR1000",

        "name":"Solr, the Enterprise Search Server",

        "manu":"Apache Software Foundation"},

        "id":"VS1GB400C3",

        "name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail",

        "manu":"Corsair Microsystems Inc."},

        "id":"VDBDB1A16",

        "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM",

        "manu":"A-DATA Technology Inc."},

        "id":"0579B002",

        "name":"Canon PIXMA MP500 All-In-One Photo Printer",

        "manu":"Canon Inc."},

        "id":"EN7800GTX/2DHTV/256M",

        "name":"ASUS Extreme N7800GTX/2DHTV (256 MB)",

        "manu":"ASUS Computer Inc."}]

请求参数

参数名	参数值	描述
group	true/false	如果设置true,打开结果分组
group.field	[fieldname]	Group based on the unique values of a field. The field must currently be single-valued and must be either indexed, or be another field type that has a value source and works in a function query - such as ExternalFileField. Note: for Solr 3.x versions the field must by a string like field such as StrField or TextField, otherwise a http status 400 is returned.
group.func	[function query]	Group based on the unique values of a function query. <!--[if gte vml 1]><v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f"> <v:stroke joinstyle="miter"/> <v:formulas> <v:f eqn="if lineDrawn pixelLineWidth 0"/> <v:f eqn="sum @0 1 0"/> <v:f eqn="sum 0 0 @1"/> <v:f eqn="prod @2 1 2"/> <v:f eqn="prod @3 21600 pixelWidth"/> <v:f eqn="prod @3 21600 pixelHeight"/> <v:f eqn="sum @0 0 1"/> <v:f eqn="prod @6 1 2"/> <v:f eqn="prod @7 21600 pixelWidth"/> <v:f eqn="sum @8 21600 0"/> <v:f eqn="prod @7 21600 pixelHeight"/> <v:f eqn="sum @10 21600 0"/> </v:formulas> <v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/> <o:lock v:ext="edit" aspectratio="t"/> </v:shapetype><v:shape id="图片_x0020_1" o:spid="_x0000_i1028" type="#_x0000_t75" alt="<!>" style='width:12pt;height:12pt;visibility:visible; mso-wrap-style:square'> <v:imagedata src="file:///C:\Users\ADMINI~1\AppData\Local\Temp\msohtmlclip1\01\clip_image001.png" o:title="<!>"/> </v:shape><![endif]--><!--[if !vml]--><!--[endif]--> Solr4.0 This parameter only is supported on 4.0
group.query	[query]	Return a single group of documents that also match the given query.
rows	[number]	分页使用，默认返回10条分组结果。
start	[number]	分组开始位置
group.limit	[number]	返回每个组(group)中的document文件数量，默认为1
group.offset	[number]	每个group分组返回的document的偏量开始位置The offset into the document list of each group.
sort	[sortspec]	How to sort the groups relative to each other. For example, sort=popularity desc will cause the groups to be sorted according to the highest popularity doc in each group. Defaults to "score desc".
group.sort	[sortspec]	How to sort documents within a single group. Defaults to the same value as the sort parameter.
group.format	grouped/simple	if simple, the grouped documents are presented in a single flat list. The start and rows parameters refer to numbers of documents instead of numbers of groups.
group.main	true/false	If true, the result of the last field grouping command is used as the main result list in the response, using group.format=simple
group.ngroups	true/false	If true, includes the number of groups that have matched the query. Default is false. <!--[if gte vml 1]><v:shape id="图片_x0020_2" o:spid="_x0000_i1027" type="#_x0000_t75" alt="<!>" style='width:12pt;height:12pt;visibility:visible;mso-wrap-style:square'> <v:imagedata src="file:///C:\Users\ADMINI~1\AppData\Local\Temp\msohtmlclip1\01\clip_image001.png" o:title="<!>"/> </v:shape><![endif]--><!--[if !vml]--><!--[endif]--> Solr4.1 WARNING: If this parameter is set to true on a sharded environment, all the documents that belong to the same group have to be located in the same shard, otherwise the count will be incorrect. If you are using SolrCloud, consider using "custom hashing"
group.truncate	true/false	If true, facet counts are based on the most relevant document of each group matching the query. Same applies for StatsComponent. Default is false. <!--[if gte vml 1]><v:shape id="图片_x0020_3" o:spid="_x0000_i1026" type="#_x0000_t75" alt="<!>" style='width:12pt;height:12pt; visibility:visible;mso-wrap-style:square'> <v:imagedata src="file:///C:\Users\ADMINI~1\AppData\Local\Temp\msohtmlclip1\01\clip_image001.png" o:title="<!>"/> </v:shape><![endif]--><!--[if !vml]--><!--[endif]--> Solr3.4 Supported from Solr 3.4 and up.
group.facet	true/false	Whether to compute grouped facets for the field facets specified in facet.field parameters. Grouped facets are computed based on the first specified group. Just like normal field faceting, fields shouldn't be tokenized (otherwise counts are computed for each token). Grouped faceting supports single and multivalued fields. Default is false. <!--[if gte vml 1]><v:shape id="图片_x0020_4" o:spid="_x0000_i1025" type="#_x0000_t75" alt="<!>" style='width:12pt;height:12pt;visibility:visible;mso-wrap-style:square'> <v:imagedata src="file:///C:\Users\ADMINI~1\AppData\Local\Temp\msohtmlclip1\01\clip_image001.png" o:title="<!>"/> </v:shape><![endif]--><!--[if !vml]--><!--[endif]--> Solr4.0 WARNING: If this parameter is set to true on a sharded environment, all the documents that belong to the same group have to be located in the same shard, otherwise the count will be incorrect. If you are using SolrCloud, consider using "custom hashing"
group.cache.percent	[0-100]	If > 0 enables grouping cache. Grouping is executed actual two searches. This option caches the second search. A value of 0 disables grouping caching. Default is 0. Tests have shown that this cache only improves search time with boolean queries, wildcard queries and fuzzy queries. For simple queries like a term query or a match all query this cache has a negative impact on performance

说明：

1、 任何数量的一组命令（group.field，group.func，group.query）可以在一个单一的请求指定。

2、 Solr3.5以后，group命令也支持分布式查询，目前group.truncate和group.func是唯一不支持分布式搜索参数。

已知的限制

1、multi-valued字段不支持分组。

Solrj使用例子

SolrServer server = this.getSolrServer();

                    SolrQuery param = new SolrQuery();

                    param.setQuery(QUERY_CONTENT);

                    param.setRows(QUERY_ROWS);

                    param.setParam(GroupParams.GROUP, GROUP);

                    param.setParam(GroupParams.GROUP_FIELD, GROUP_FIELD);

                    param.setParam(GroupParams.GROUP_LIMIT, GROUP_LIMIT);

                    QueryResponse response = null;

                    try {

                               response = server.query(param);

                    } catch (SolrServerException e) {

                               logger.error(e.getMessage(), e);

                    Map<String, Integer> info = new HashMap<String, Integer>();

                    GroupResponse groupResponse = response.getGroupResponse();

                    if(groupResponse != null) {

                               List<GroupCommand> groupList = groupResponse.getValues();

                               for(GroupCommand groupCommand : groupList) {

                                         List<Group> groups = groupCommand.getValues();

                                         for(Groupgroup : groups) {

                                                   info.put(group.getGroupValue(), (int)group.getResult().getNumFound());

示例2

SolrQuery SolrQuery = new SolrQuery("*:*");

solrQuery.addFilterQuery("display:1");

solrQuery.addFilterQuery("activityBeginTime:[* TO NOW]");

solrQuery.addFilterQuery("activityEndTime:[NOW TO *]");

solrQuery.setGroup(true);

solrQuery.setParam(GroupParams.GROUP_QUERY, {"id:1","id:2"});

solrQuery.setParam(GroupParams.GROUP_LIMIT, pageSize + "");

solrQuery.setParam(GroupParams.GROUP_OFFSET, pageSize * (page - 1) + "");

solrQuery.setParam(GroupParams.GROUP_LIMIT, "1");

solrQuery.setParam(GroupParams.GROUP_SORT, "id desc", "sort asc");

solrQuery.setRows(0);

QueryResponse qr = searchSource.query(searchQuery, SolrRequest.METHOD.POST);

GroupResponse groupResponse = qr.getGroupResponse();

List<GroupCommand> list = groupResponse.getValues();

for (GroupCommand gc : list) {

List<Group> gs = gc.getValues();

if (CollectionUtils.isNotEmpty(gs)) {

for (Group g : gs) {

SolrDocumentList sds = g.getResult();

if (CollectionUtils.isNotEmpty(sds)) {

for (SolrDocument doc : sds) {

String id= doc.getFieldValue("id").toString();

}

http://localhost:8080/solr/collection1/select?q=*%3A*&wt=xml&indent=true&group=true&group.field=KV_TITLE&group.query=KR_UID:001&group.limit=5

padding solr

安科网

solr实现结果分组、字段折叠

lhc0

solr实现结果分组、字段折叠

引言

快速启动

请求参数

已知的限制

Solrj使用例子

lhc0

相关推荐

【内测来袭】PerfDogService 一键搭建您的性能测试平台

最靠谱的http协议无状态解释

vmware扩展跟分区

(转)[iOS概念]Apple Pay与IAP的区别

QNET：APP弱网络测试专家

kafka技术题

几维安全用代码虚拟化技术解决IOT安全核心痛点，让万物互联更安全

Wyn Enterprise中如何转换数据类型？

Xamarin.Forms 通用类库中平台差异属性设置方法

mq如何处理消息丢失

css基础--盒子模型

爬虫系列之自动化运维（一）服务器节点详细设计

HTML元素的默认样式

03前端_css盒子

css padding-right没有用?

css 你真的了解padding吗?

keras实现多种分类网络的方式

css盒子模型、height:100%;

css面试题汇总（持续更新）

安卓移动应用代码安全加固系统设计及实现

lhc0