Lucene和IKAnalyzer处理中文:索引、搜索实例
版本:lucene3.02,IKAnalyzer3.20
检索程序(Indexer.java)实现了对给定文件夹下深度遍历txt文件经行索引。
通过实例化IndexWriter将newIKAnalyzer(false)作为第二个参数传入。
在indexFile()中,通过内部类newField()的形式将索引字段和相应的输入加入Document中。lucene3.*的这一改进须留意。
特别的,由于处理中文,而对于indexFile()的第二个Reader参数,如果IDE的环境为utf-8,则会让IO流处理中文时得到乱码,所以这里改用InputStreamReader实现。
public class indexer { private File baseDir = new File("E:\\"); private File indexDir = new File("F:\\indexDir"); public indexer() { if (!this.baseDir.exists() || !this.indexDir.exists()) { return; } } public void createIndex() { try { IndexWriter writer = new IndexWriter( FSDirectory.open(indexDir), new IKAnalyzer(false), true, IndexWriter.MaxFieldLength.LIMITED); indexDirectory(writer, baseDir); writer.optimize(); //优化合并 writer.close(); System.out.println("索引完毕"); } catch (CorruptIndexException e) { e.printStackTrace(); } catch (LockObtainFailedException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } private void indexDirectory(IndexWriter writer, File dir) { if (!dir.exists() || !dir.isDirectory()) { return; } File[] files = dir.listFiles(); for (File file : files) { if (file.isDirectory()) indexDirectory(writer, file); else indexFile(writer, file); } } private void indexFile(IndexWriter writer, File file) { if (file.isHidden() || !file.exists() || !file.canRead()) { return; } try { if (file.getCanonicalPath().endsWith(".txt")) { System.out.println("正在索引:" + file.getCanonicalPath()); Document doc = new Document(); doc.add(new Field("text", new InputStreamReader(new FileInputStream(file),"GBK")));// 对文件内容索引 doc.add(new Field("filename", file.getCanonicalPath(), Field.Store.YES, Field.Index.ANALYZED));// 对文件名建立索引 writer.addDocument(doc);// 调用addDocument()方法,Lucene会建立doc的索引 } } catch (FileNotFoundException e) { e.printStackTrace(); } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } public static void main(String[] args) { indexer lucene = new indexer(); lucene.createIndex(); } }
相关推荐
ribavnu 2020-11-16
moyekongling 2020-11-13
坚持是一种品质 2020-11-16
chenjiazhu 2020-09-29
kikaylee 2020-10-31
Ida 2020-09-16
liuweiq 2020-09-09
silencehgt 2020-09-07
sunnyxuebuhui 2020-09-07
西瓜皮儿的皮儿 2020-09-07
LuckyLXG 2020-09-08
明瞳 2020-08-19
MissFuTT 2020-08-18
jzlixiao 2020-08-18
zhushenghan 2020-08-16
罗罗 2020-08-16
mrandy 2020-08-15
houdaxiami 2020-08-15