Lucene中对document的CURD操作:为分布式全文检索设计

oklinsong

2012-08-30

Lucene.net是.net环境中比较强的全文检索工具，它是从ＪＡＶＡ中转过来的，.net版本的lucene在功能上也豪不逊色于java版的lucene。今天主要来说一下lucene索引文件在更新时的一些方式。

一、整个索引文件　（cfs文件）覆盖更新；优点：简单，缺点：与服务器没有交互，但在生成索引文件时对ＩＯ影响比较大，前台lucene信息显示与数据库不同步。

二、索引文件按需要更新（对document记录进行curd操作），优点：与数据库同步，缺点：与服务器交互多，对于curd的安全性要重视起来，但这样做是必须的。

下面主要说一下第二种索引文件按需要更新的情况：

追加document（记录）：当数据库表中有insert操作时，这时lucene也应该进行相应的insert操作，这就是追加，在IndexWriter中有AddDocument方法，它没什么好说的，按着方法签名转值即可，注意操作完成后要对IndexWriter进行Optimize和Close

[WebMethod]  



         public int AppendLuceneDocument(string primaryKey, string id, string name, string info, string categoryName, string propertyName, string module, string passKey)  



         {  



             int flag = 0;  




             try 



             {  



                 dirInfo = Directory.CreateDirectory(this.GetIndexPath(ConfigurationManager.AppSettings[module]));  



                 directory = LuceneIO.FSDirectory.Open(dirInfo);  



                 IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29), false, IndexWriter.MaxFieldLength.UNLIMITED);  




                 Document doc = new Document();  




                 doc.Add(new Field("PrimaryKey", primaryKey, Field.Store.YES, Field.Index.ANALYZED));  




                 doc.Add(new Field("ID", id, Field.Store.YES, Field.Index.NO));  




                 doc.Add(new Field("Name", name, Field.Store.YES, Field.Index.ANALYZED));  




                 doc.Add(new Field("Info", info, Field.Store.YES, Field.Index.ANALYZED));  




                 doc.Add(new Field("CategoryName", categoryName, Field.Store.YES, Field.Index.ANALYZED));  




                 doc.Add(new Field("PropertyName", propertyName, Field.Store.YES, Field.Index.ANALYZED));  



                 writer.AddDocument(doc);  


                 writer.Optimize();  


                 writer.Close();  


                 flag = 1;  


             }  



             catch (Exception)  



             {  


   



                 throw;  



             }  



             return flag;  



         }

删除记录（document)：这个操作需要我们注意几点：

１、要删除的记录的依据应该具有唯一性，这样删除才有意义，并且这个字段在lucene存储时需要是ANALYZED，即可以被检索到

２、删除时的条件最好使用Query，而不要使用Term，我做过很多测试，结果证明Term条件总是不要使。

对于删除的代码如下：

[WebMethod]  



         public int DeleteLuceneDocument(string primaryKey, string module, string passKey)  



         {  



             int flag = 0;  




             try 



             {  



                 dirInfo = Directory.CreateDirectory(this.GetIndexPath(ConfigurationManager.AppSettings[module]));  



                 directory = LuceneIO.FSDirectory.Open(dirInfo);  



                 IndexWriter writer = new IndexWriter(directory, standardAnalyzer, false, IndexWriter.MaxFieldLength.UNLIMITED);  




                 QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "PrimaryKey", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));  



                 Query query = parser.Parse(primaryKey);  


                 writer.DeleteDocuments(query);  


                 writer.Commit();  


                 writer.Optimize();  


                 writer.Close();  


                 flag = 1;  


             }  



             catch (Exception)  



             {  


   



                 throw;  



             }  



             return flag;  



         }

而更新操作事实上就是先把记录删除，再追加一条新的记录即可，而IndexWriter为我们提供的UpdateDocuments感觉更向是在复制一个，所以不建议使用它，

而是手动删除和追加来完成这个update操作。

[WebMethod]  



         public int UpdateLuceneDocument(string primaryKey, string id, string name, string info, string categoryName, string propertyName, string module, string passKey)  



         {  



             int flag = 0;  




             try 



             {  



                 dirInfo = Directory.CreateDirectory(this.GetIndexPath(ConfigurationManager.AppSettings[module]));  



                 directory = LuceneIO.FSDirectory.Open(dirInfo);  



                 IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29), false, IndexWriter.MaxFieldLength.UNLIMITED);  




                 Document doc = new Document();  




                 doc.Add(new Field("PrimaryKey", primaryKey, Field.Store.YES, Field.Index.ANALYZED));  




                 doc.Add(new Field("ID", id, Field.Store.YES, Field.Index.NO));  




                 doc.Add(new Field("Name", name, Field.Store.YES, Field.Index.ANALYZED));  




                 doc.Add(new Field("Info", info, Field.Store.YES, Field.Index.ANALYZED));  




                 doc.Add(new Field("CategoryName", categoryName, Field.Store.YES, Field.Index.ANALYZED));  




                 doc.Add(new Field("PropertyName", propertyName, Field.Store.YES, Field.Index.ANALYZED));  




                 QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "PrimaryKey", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));  



                 Query query = parser.Parse(primaryKey);  


                 writer.DeleteDocuments(query);  


                 writer.Commit();  


                 writer.AddDocument(doc);  


                 writer.Optimize();  


                 writer.Close();  


                 flag = 1;  


             }  



             catch (Exception)  



             {  


   



                 throw;  



             }  


   



             return flag;  



         }

lucene doc 数据检索全文检索全文索引

安科网

Lucene中对document的CURD操作:为分布式全文检索设计

oklinsong

oklinsong

相关推荐

solr boost设置

Spring Boot 教程 - Elasticsearch

1.elasticsearch单节点部署

lucene和Elasticsearch

Lucene

Lucene、Solr、ElasticSearch、hibernate-search四部曲

Lucene的学习

Lucene入门精讲视频教程

厉害了，ES 如何做到几十亿数据检索 3 秒返回

Lucene系列二：Lucene（Lucene介绍、Lucene架构）

Lucene的索引文件格式(1)

lucene&solr全文检索_7solr后台界面的介绍

lucene&solr全文检索_3查询索引

《从Lucene到Elasticsearch全文检索实战》的P184页

Net Core使用Lucene.Net和盘古分词器实现全文检索

全文检索Lucene

Net Core使用Lucene.Net和盘古分词器实现全文检索

Lucene全文检索引擎

Lucene介绍与使用

Lucene教程

oklinsong