lucene 3.0学习笔记（2）－使用索引查询（转）

renjinlong

2010-11-16

上一篇中我们已经建好了索引，下面该使用索引来做正事了。

这是一段实施基本搜索功能的代码示例：

Directory dir = FSDirectory.open(new File("index")));  
IndexSearcher searcher = new IndexSearcher(dir, true);  
Query q = new TermQuery(new Term("contents", "java"));  
TopDocs hits = searcher.search(q, 10);  
searcher.close();

使用索引进行查询的主要步骤：

1、打开已有的索引，创建IndexSearcher对象

2、指定查询用到的Field和查询字符串，创建TermQuery

3、使用IndexSearcher进行查询，查询结果以TopDocs对象返回。在这里search方法的第二个参数指定返回前N个记录。

主要对象说明：

1、Term

Term是查询使用的基本单位，对应与在索引中使用的Field类。可以将其理解为一个map，其中key为索引中Fieldname，value为查询字符串。

当查询字符串为一个单词的情况下，不会有任何问题；但是当需要查询查询字符串为多个单词或是一句话的时候就会查不出来。这个主要原因是，在建立索引时我们对Field中的内容进行了分词，但在查询时，对查询字符串没有做分词，整个做为一个单词处理，当然查不到了。

要解决这个问题，针对上面的例子，只需要去掉newTermQuery这句，换成下面的代码：

//处理输入的查询字符串

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);  
QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "java", analyzer);  
Query query = parser.parse(queries);

这里需要保证使用的analyzer与建立索引时用的一样即可。newQueryParser的第二个参数就是查询字符串。

我们用query.toString()可以看到转化后的Term内容。

•当查询字符串＝"java"时，query.toString()＝contents:java（前面为fieldname，后面为查询内容）

•当查询字符串＝"javaandsystem"时，query.toString()＝contents:javacontents:system。可见已经被做了分词，同时去掉了连接字and

2、TopDocs

此类封装了返回的符合条件的记录，其中：

•totalHits为符合条件的记录总数；

•scoreDocs为符合条件记录的数组，不过里面只记录了Document的ID。Document的实际内容，需通过IndexSearcher取docID对应的Document才能得到。

需要注意－若设置在search方法中设置了返回记录数为N，则scoreDocs最多只会包含前N个文档；但是totalHits会返回匹配的总数量（类似google中显示的匹配的总页面数量）。scoreDocs.length可能不等于totalHits，做scoreDocs遍历时，直接用totalHits做为数组大小用的，容易引起bug！

另外取符合条件Document实际内容的代码如下：

//显示查询结果，字段包括：路径、修改时间

private void printDocs(Searcher searcher, TopDocs docs)  
    {  
        try  
        {  
            System.out.println("Find " + docs.totalHits + " files!");  
            ScoreDoc[] sd = docs.scoreDocs;  
            for (int i = 0; i < sd.length; i++)  
            {  
                Document doc = searcher.doc(sd[i].doc);  
                System.out.println("Path:" + doc.get("path") + "; modified:" + doc.get("modified"));  
            }  
        }  
        catch(Exception ex)  
        {  
            System.out.println(ex.getMessage());  
        }  
    }

DEMO:

public class Indexer {  
  
    /** 
     * @param args 
     * @throws IOException  
     */  
    public static void main(String[] args) throws IOException {  
        //保存索引文件的地方  
        String indexDir = "c:\\indexDir";  
        //将要搜索TXT文件的地方  
        String dateDir = "c:\\dateDir";  
        IndexWriter indexWriter = null;  
        //创建Directory对象  
        Directory dir = new SimpleFSDirectory(new File(indexDir));  
        //创建IndexWriter对象,第一个参数是Directory,第二个是分词器,第三个表示是否是创建,如果为false为在此基础上面修改,第四表示表示分词的最大值，比如说new MaxFieldLength(2)，就表示两个字一分，一般用IndexWriter.MaxFieldLength.LIMITED   
        indexWriter = new IndexWriter(dir,new StandardAnalyzer(Version.LUCENE_30),true,IndexWriter.MaxFieldLength.UNLIMITED);  
        File[] files = new File(dateDir).listFiles();  
        for (int i = 0; i < files.length; i++) {  
            Document doc = new Document();  
            //创建Field对象，并放入doc对象中  
            doc.add(new Field("contents", new FileReader(files[i])));   
            doc.add(new Field("filename", files[i].getName(),   
                                Field.Store.YES, Field.Index.NOT_ANALYZED));  
            doc.add(new Field("indexDate",DateTools.dateToString(new Date(), DateTools.Resolution.DAY),Field.Store.YES,Field.Index.NOT_ANALYZED));  
            //写入IndexWriter  
            indexWriter.addDocument(doc);  
        }  
        //查看IndexWriter里面有多少个索引  
        System.out.println("numDocs"+indexWriter.numDocs());  
        indexWriter.optimize();  
        indexWriter.close();  
          
    }  
  
}

public class Searcher {  
  
    public static void main(String[] args) throws IOException, ParseException {  
        //保存索引文件的地方  
        String indexDir = "c:\\indexDir";  
        Directory dir = new SimpleFSDirectory(new File(indexDir));  
        //创建 IndexSearcher对象，相比IndexWriter对象，这个参数就要提供一个索引的目录就行了  
        IndexSearcher indexSearch = new IndexSearcher(dir);  
        //创建QueryParser对象,第一个参数表示Lucene的版本,第二个表示搜索Field的字段,第三个表示搜索使用分词器  
        QueryParser queryParser = new QueryParser(Version.LUCENE_30,  
                "contents", new StandardAnalyzer(Version.LUCENE_30));  
        //生成Query对象  
        Query query = queryParser.parse("liliugen");  
        //搜索结果 TopDocs里面有scoreDocs[]数组，里面保存着索引值  
        TopDocs hits = indexSearch.search(query, 10);  
        //hits.totalHits表示一共搜到多少个  
        System.out.println("找到了"+hits.totalHits+"个");  
        //循环hits.scoreDocs数据，并使用indexSearch.doc方法把Document还原，再拿出对应的字段的值  
        for (int i = 0; i < hits.scoreDocs.length; i++) {  
            ScoreDoc sdoc = hits.scoreDocs[i];  
            Document doc = indexSearch.doc(sdoc.doc);  
            System.out.println(doc.get("filename"));              
        }         
        indexSearch.close();  
    }  
}

lucene

安科网

lucene 3.0学习笔记（2）－使用索引查询（转）

renjinlong

renjinlong

相关推荐

十张图说清Elasticsearch原理！

MAC OS 10.15 Lucene 源码分析环境搭建

由于Elasticsearch是在 Lucene 基础上构建

Spring Boot 教程 - Elasticsearch

全文搜索Lucene之倒排索引

Elasticsearch用得好，下班下得早！

Elasticsearch对垒8大竞品技术，孰优孰劣？

1.elasticsearch单节点部署

lucene和Elasticsearch

ES索引的一些长度限制

Lucene

lucene&solr全文检索_7solr后台界面的介绍

lucene&solr全文检索_3查询索引

Lucene、Solr、ElasticSearch、hibernate-search四部曲

ElasticSearch

《从Lucene到Elasticsearch全文检索实战》的P184页

es lucene搜索及聚合流程源码分析

Net Core使用Lucene.Net和盘古分词器实现全文检索

Solr与JDK对应版本关系，Tomcat与JDK版本对应关系

全文检索Lucene

renjinlong