solr实现结果分组、字段折叠
solr实现结果分组、字段折叠
引言
字段合并和结果分组是同样的Solr特征的不同的方式思考。
字段合并是将一组结果相同的field合并,,例如:大多数搜索引擎如谷歌合并后只有一个或两项显示,随着一个链接点击看看从网站更多的结果。合并也可以用来抑制重复的文件。
结果分组是使用一个共同field值分组document,返回顶部的document组,顶部的document是基于分组的document. 一个例子是一个搜索在百思买的常用术语如dvd,显示前3个结果的每个类别(“电视和视频”,“电影”,“计算机”,等)
快速启动
如果你还没有准备好,请先下载solr相关文件,然后参考【solr入门.doc】完成搭建。
现在开启结果分组并且请求一个查询,我们第一次尝试在制造商名称分组(manu_exact field)
你现在只能在单值的域组!
...&q=solr+memory&group=true&group.field=manu_exact
Group分组返回的结果是:
[...]
"grouped":{
"manu_exact":{
"matches":6,
"groups":[{
"groupValue":"Apache Software Foundation",
"doclist":{"numFound":1,"start":0,"docs":[
{
"id":"SOLR1000",
"name":"Solr, the Enterprise Search Server"}]
}},
{
"groupValue":"Corsair Microsystems Inc.",
"doclist":{"numFound":2,"start":0,"docs":[
{
"id":"VS1GB400C3",
"name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail"}]
}},
{
"groupValue":"A-DATA Technology Inc.",
"doclist":{"numFound":1,"start":0,"docs":[
{
"id":"VDBDB1A16",
"name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM"}]
}},
{
"groupValue":"Canon Inc.",
"doclist":{"numFound":1,"start":0,"docs":[
{
"id":"0579B002",
"name":"Canon PIXMA MP500 All-In-One Photo Printer"}]
}},
{
"groupValue":"ASUS Computer Inc.",
"doclist":{"numFound":1,"start":0,"docs":[
{
"id":"EN7800GTX/2DHTV/256M",
"name":"ASUS Extreme N7800GTX/2DHTV (256 MB)"}]
}}]}}
response 表明有6条匹配我们的结果,为每一个独特的group.field值,一个得分最高的文档doclist返回。该doclist也返回该组中的总的匹配数为“numfound”。该group本身也按最高的文档的得分在每一组显示。
我们可以找到最高分值的的document,同时匹配任意查询与group.query命令(像facet.query)。例如:我们可以利用这一结果查询前3名的document在不同的价格范围内:
...&q=memory&group=true&group.query=price:[0 TO 99.99]&group.query=price:[100 TO *]&group.limit=3
[...]
"grouped":{
"price:[0 TO 99.99]":{
"matches":5,
"doclist":{"numFound":1,"start":0,"docs":[
{
"name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail",
"price":74.99}]
}},
"price:[100 TO *]":{
"matches":5,
"doclist":{"numFound":3,"start":0,"docs":[
{
"name":"Canon PIXMA MP500 All-In-One Photo Printer",
"price":179.99},
{
"name":"CORSAIR XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail",
"price":185.0},
{
"name":"ASUS Extreme N7800GTX/2DHTV (256 MB)",
"price":479.95}]
}}
[...]
从上面的反应,通过查询“memory”可以返回5条 document。当然,1的价格低于100美元,3有100美元以上的价格。总计不达5因为一个document被不存在的价格,因此不匹配group.query。
我们可以使用的一组命令展现”main result”,通过添加参数group.main=true,虽然这一结果格式不拥有尽可能多的信息,它可以为现有的Solr客户端更容易解析。
...&q=solr+memory&group=true&group.field=manu_exact&group.main=true
"response":{"numFound":6,"start":0,"docs":[
{
"id":"SOLR1000",
"name":"Solr, the Enterprise Search Server",
"manu":"Apache Software Foundation"},
{
"id":"VS1GB400C3",
"name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail",
"manu":"Corsair Microsystems Inc."},
{
"id":"VDBDB1A16",
"name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM",
"manu":"A-DATA Technology Inc."},
{
"id":"0579B002",
"name":"Canon PIXMA MP500 All-In-One Photo Printer",
"manu":"Canon Inc."},
{
"id":"EN7800GTX/2DHTV/256M",
"name":"ASUS Extreme N7800GTX/2DHTV (256 MB)",
"manu":"ASUS Computer Inc."}]
}
请求参数
参数名 | 参数值 | 描述 |
group | true/false | 如果设置true,打开结果分组 |
group.field | [fieldname] | Group based on the unique values of a field. The field must currently be single-valued and must be either indexed, or be another field type that has a value source and works in a function query - such as ExternalFileField. Note: for Solr 3.x versions the field must by a string like field such as StrField or TextField, otherwise a http status 400 is returned. |
group.func | [function query] | Group based on the unique values of a function query. <!--[if gte vml 1]><v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f"> <v:stroke joinstyle="miter"/> <v:formulas> <v:f eqn="if lineDrawn pixelLineWidth 0"/> <v:f eqn="sum @0 1 0"/> <v:f eqn="sum 0 0 @1"/> <v:f eqn="prod @2 1 2"/> <v:f eqn="prod @3 21600 pixelWidth"/> <v:f eqn="prod @3 21600 pixelHeight"/> <v:f eqn="sum @0 0 1"/> <v:f eqn="prod @6 1 2"/> <v:f eqn="prod @7 21600 pixelWidth"/> <v:f eqn="sum @8 21600 0"/> <v:f eqn="prod @7 21600 pixelHeight"/> <v:f eqn="sum @10 21600 0"/> </v:formulas> <v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/> <o:lock v:ext="edit" aspectratio="t"/> </v:shapetype><v:shape id="图片_x0020_1" o:spid="_x0000_i1028" type="#_x0000_t75" alt="<!>" style='width:12pt;height:12pt;visibility:visible; mso-wrap-style:square'> <v:imagedata src="file:///C:\Users\ADMINI~1\AppData\Local\Temp\msohtmlclip1\01\clip_image001.png" o:title="<!>"/> </v:shape><![endif]--><!--[if !vml]--><!--[endif]--> Solr4.0 This parameter only is supported on 4.0 |
group.query | [query] | Return a single group of documents that also match the given query. |
rows | [number] | 分页使用,默认返回10条分组结果。 |
start | [number] | 分组开始位置 |
group.limit | [number] | 返回每个组(group)中的document文件数量,默认为1 |
group.offset | [number] | 每个group分组返回的document的偏量开始位置The offset into the document list of each group. |
sort | [sortspec] | How to sort the groups relative to each other. For example, sort=popularity desc will cause the groups to be sorted according to the highest popularity doc in each group. Defaults to "score desc". |
group.sort | [sortspec] | How to sort documents within a single group. Defaults to the same value as the sort parameter. |
group.format | grouped/simple | if simple, the grouped documents are presented in a single flat list. The start and rows parameters refer to numbers of documents instead of numbers of groups. |
group.main | true/false | If true, the result of the last field grouping command is used as the main result list in the response, using group.format=simple |
group.ngroups | true/false | If true, includes the number of groups that have matched the query. Default is false. <!--[if gte vml 1]><v:shape id="图片_x0020_2" o:spid="_x0000_i1027" type="#_x0000_t75" alt="<!>" style='width:12pt;height:12pt;visibility:visible;mso-wrap-style:square'> <v:imagedata src="file:///C:\Users\ADMINI~1\AppData\Local\Temp\msohtmlclip1\01\clip_image001.png" o:title="<!>"/> </v:shape><![endif]--><!--[if !vml]--><!--[endif]--> Solr4.1 |
group.truncate | true/false | If true, facet counts are based on the most relevant document of each group matching the query. Same applies for StatsComponent. Default is false. <!--[if gte vml 1]><v:shape id="图片_x0020_3" o:spid="_x0000_i1026" type="#_x0000_t75" alt="<!>" style='width:12pt;height:12pt; visibility:visible;mso-wrap-style:square'> <v:imagedata src="file:///C:\Users\ADMINI~1\AppData\Local\Temp\msohtmlclip1\01\clip_image001.png" o:title="<!>"/> </v:shape><![endif]--><!--[if !vml]--><!--[endif]--> Solr3.4 Supported from Solr 3.4 and up. |
group.facet | true/false | Whether to compute grouped facets for the field facets specified in facet.field parameters. Grouped facets are computed based on the first specified group. Just like normal field faceting, fields shouldn't be tokenized (otherwise counts are computed for each token). Grouped faceting supports single and multivalued fields. Default is false. <!--[if gte vml 1]><v:shape id="图片_x0020_4" o:spid="_x0000_i1025" type="#_x0000_t75" alt="<!>" style='width:12pt;height:12pt;visibility:visible;mso-wrap-style:square'> <v:imagedata src="file:///C:\Users\ADMINI~1\AppData\Local\Temp\msohtmlclip1\01\clip_image001.png" o:title="<!>"/> </v:shape><![endif]--><!--[if !vml]--><!--[endif]--> Solr4.0 |
group.cache.percent | [0-100] | If > 0 enables grouping cache. Grouping is executed actual two searches. This option caches the second search. A value of 0 disables grouping caching. Default is 0. Tests have shown that this cache only improves search time with boolean queries, wildcard queries and fuzzy queries. For simple queries like a term query or a match all query this cache has a negative impact on performance |
说明:
<!--[if !supportLists]-->1、 <!--[endif]-->任何数量的一组命令(group.field,group.func,group.query)可以在一个单一的请求指定。
<!--[if !supportLists]-->2、 <!--[endif]-->Solr3.5以后,group命令也支持分布式查询,目前group.truncate和group.func是唯一不支持分布式搜索参数。
已知的限制
1、multi-valued字段不支持分组。
Solrj使用例子
SolrServer server = this.getSolrServer(); SolrQuery param = new SolrQuery(); param.setQuery(QUERY_CONTENT); param.setRows(QUERY_ROWS); param.setParam(GroupParams.GROUP, GROUP); param.setParam(GroupParams.GROUP_FIELD, GROUP_FIELD); param.setParam(GroupParams.GROUP_LIMIT, GROUP_LIMIT); QueryResponse response = null; try { response = server.query(param); } catch (SolrServerException e) { logger.error(e.getMessage(), e); } Map<String, Integer> info = new HashMap<String, Integer>(); GroupResponse groupResponse = response.getGroupResponse(); if(groupResponse != null) { List<GroupCommand> groupList = groupResponse.getValues(); for(GroupCommand groupCommand : groupList) { List<Group> groups = groupCommand.getValues(); for(Groupgroup : groups) { info.put(group.getGroupValue(), (int)group.getResult().getNumFound()); } } }
|
示例2
SolrQuery SolrQuery = new SolrQuery("*:*"); solrQuery.addFilterQuery("display:1"); solrQuery.addFilterQuery("activityBeginTime:[* TO NOW]"); solrQuery.addFilterQuery("activityEndTime:[NOW TO *]"); solrQuery.setGroup(true); solrQuery.setParam(GroupParams.GROUP_QUERY, {"id:1","id:2"}); solrQuery.setParam(GroupParams.GROUP_LIMIT, pageSize + ""); solrQuery.setParam(GroupParams.GROUP_OFFSET, pageSize * (page - 1) + ""); solrQuery.setParam(GroupParams.GROUP_LIMIT, "1"); solrQuery.setParam(GroupParams.GROUP_SORT, "id desc", "sort asc"); solrQuery.setRows(0);
QueryResponse qr = searchSource.query(searchQuery, SolrRequest.METHOD.POST); GroupResponse groupResponse = qr.getGroupResponse(); List<GroupCommand> list = groupResponse.getValues();
for (GroupCommand gc : list) { List<Group> gs = gc.getValues(); if (CollectionUtils.isNotEmpty(gs)) { for (Group g : gs) { SolrDocumentList sds = g.getResult(); if (CollectionUtils.isNotEmpty(sds)) { for (SolrDocument doc : sds) { String id= doc.getFieldValue("id").toString(); } } } } } } |