lucene各大版本变化总览

3.5

该版本进行了大量优化、改进和Bug修复,包括:

  • 大大降低了控制开放的IndexReader上的协议索引的RAM占用(3~5倍)。
  • 新增IndexSearcher.searchAfter,可在指定ScoreDoc后返回结果(例如之前页面的最后一个文档),以支持deep页用例。
  • 新增SearcherManager,以管理共享和重新开始跨多个搜索线程的IndexSearchers。基本的IndexReader实例如果不再进行引用,则会被安全关闭。
  • 新增SearcherLifetimeManager,为跨多个请求(例如:paging/drilldown)的索引安全地提供了一个一致的视图。
  • 将IndexWriter.optimize重命名为forceMerge,以便去阻止使用这种方法,因为它的使用代价较高,且也不需要使用。
  • 新增NGramPhraseQuery,当使用n-gram分析时,可提升30%-50%的短语查询速度。
  • 重新开放了一个API(IndexReader.openIfChanged),如果索引没有变化,则返回空值,而不是旧的reader。
  • Vector改进:支持更多查询,如通配符和用于产生摘要的边界分析。
  • 修复了若干Bug。

3.4

此次发布包括了大量的bug修复、优化及改进。主要改进如下:

  • 修复了一个重要的bug(LUCENE-3418):操作系统或电脑崩溃,或是断电时Lucene索引文件很容易受到损坏。
  • 增加一个新的faceting模块(contrib/facet),以便计算检索时间内的分面统计(包括hierarchial和non-hierarchical的)(LUCENE-3079)。
  • 增加一个新的join模块(contrib/join),能够使用BlockJoinQuery/Collector对内嵌(parent/child)文档进行索引及检索(LUCENE-3171)。
  • 现在索引文档可以包含词频,而不带地址了(LUCENE-2048);先前的omitTermFreqAndPositions总是将两者都省略。
  • http://www.71pic.com
  • QueryParser模块(contrib/queryparser)现在可以创建NumericRangeQuery。
  • 在contrib/analyzers中增加了一个SynonymFilter,可以进行多关键词索引或查询,其中包含可以读取wordnet及solr同义词格式的分析程序。(LUCENE-3233)。
  • 现在能够控制缺少排序字段的文档的排序操作,可使用SortField.setMissingValue实现(LUCENE-3390)。
  • 修复了在使用addIndexes方法后,term vector会被从索引中静默删除的问题(LUCENE-3402

3.3-2011.7

Highlights of the Lucene release include:

  • The spellchecker module now includes suggest/auto-complete functionality, with three implementations: Jaspell, Ternary Trie, and Finite State.
  • Support for merging results from multiple shards, for both "normal" search results (TopDocs.merge) as well as grouped results using the grouping module (SearchGroup.merge, TopGroups.merge).
  • An optimized implementation of KStem, a less aggressive stemmer for English
  • Single-pass grouping implementation based on block document indexing.
  • Improvements to MMapDirectory (now also the default implementation returned by FSDirectory.open on 64-bit Linux).
  • NRTManager simplifies handling near-real-time search with multiple search threads, allowing the application to control which indexing changes must be visible to which search requests.
  • TwoPhaseCommitTool facilitates performing a multi-resource two-phased commit, including IndexWriter.
  • The default merge policy, TieredMergePolicy, has a new method (set/getReclaimDeletesWeight) to control how aggressively it targets segments with deletions, and is now more aggressive than before by default.
  • PKIndexSplitter tool splits an index by a mid-point term.

3.2-2011-6

  • A new grouping module, under lucene/contrib/grouping, enables search results to be grouped by a single-valued indexed field 原来这版本才出来
  • A new IndexUpgrader tool fully converts an old index to the current format.
  • A new Directory implementation, NRTCachingDirectory, caches small segments in RAM, to reduce the I/O load for applications with fast NRT reopen rates.
  • A new Collector implementation, CachingCollector, is able to gather search hits (document IDs and optionally also scores) and then replay them. This is useful for Collectors that require two or more passes to produce results.
  • Index a document block using IndexWriter's new addDocuments or updateDocuments methods. These experimental APIs ensure that the block of documents will forever remain contiguous in the index, enabling interesting future features like grouping and joins.
  • A new default merge policy, TieredMergePolicy, which is more efficient due to being able to merge non-contiguous(邻近的,连续) segments. See http://www.71pic.com for details.
  • NumericField is now returned correctly when you load a stored document (previously you received a normal Field back, with the numeric value converted string).
  • Deleted terms are now applied during flushing to the newly flushed segment, which is more efficient than having to later initialize a reader for that segment.

3.1-2011.3

ConcurrentMergeScheduler is more careful about setting priority of merge threads.

ReusableAnalyzerBase makes it easier to reuse TokenStreams correctly.

ConstantScoreQuery now allows directly wrapping a Query.

IndexWriter is now configured with a new separate builder API, IndexWriterConfig. You can now control IndexWriter's previously fixed internal thread limit by calling setMaxThreadStates.

IndexWriter.getReader is replaced by IndexReader.open(IndexWriter)

MultiSearcher is deprecated; ParallelMultiSearcher has been absorbed directly into IndexSearcher.

  • New TotalHitCountCollector just counts total number of hits.
  • ReaderFinishedListener API enables external caches to evict entries once a segment is finished.

据说是已经实现了grouping,但还是没说出来。。。

3.0.3-2010-12

a memory leak in IndexWriter exacerbated by frequent commits

这也说明还不是很稳定

fixed:NumericRangeQuery / NumericRangeFilter sometimes returning incorrect results with bounds near Long.MIN_VALUE and Long.MAX_VALUE

various thread safety issues

3.0.2-2010-6

Fixed memory leaks in IndexWriter when large documents are indexed. It also uses now shared memory pools for term vectors and stored fields. IndexWriter now releases Fieldables and Readers on close.

Performance improvements in ParallelMultiSearcher (3.0.2 only).

相关推荐