solr的配置参数理解
l dataDir参数
用于替换默认的索引数据目录(./data)。如果重复指定,将使用重复的值。如果不是绝对路径,将使用servlet容器当前工作目录下的相对路径。
<dataDir>/var/data/solr</dataDir>
l mainIndex参数部分
mainIndex>
<!-- lucene options specific to the main on-disk lucene index -->
<useCompoundFile>false</useCompoundFile>
<mergeFactor>10</mergeFactor>
<maxBufferedDocs>1000</maxBufferedDocs>
<maxMergeDocs>2147483647</maxMergeDocs>
<maxFieldLength>10000</maxFieldLength>
</mainIndex>
【mergeFactor】指定同样大小的segment达到多少时会被合并。如果你设置改值为10,那么每当1000(maxBufferedDocs)个doc被添加到索引时(它们可能在内存中),一个新的sgement将在硬盘上创建,当第10个同样大小的segment被创建后,这10个segement 将被合并成一个包含10000(10*1000)个doc的segment。同样当第10个包含10000个doc的segment被创建的时候,他们将合并成更大的segment。当然这种合并并不是无休止的。这是因为下面的参数对其进行了限制。
【maxMergeDocs】每个segment所能容纳的doc数目上限。
【maxFieldLength】指定每个field的最大长度。
l Update Handler 参数部分
这部分通常是关于内部如如何处理update低级配置信息(不要与处理客户端发送的update的Request Handler高级配置信息相混淆)。
<updateHandler class="solr.DirectUpdateHandler2">
<!-- Limit the number of deletions Solr will buffer during doc updating.
Setting this lower can help bound memory use during indexing.
-->
<maxPendingDeletes>100000</maxPendingDeletes>
<!-- autocommit pending docs if certain criteria are met. Future versions may expand the available
criteria -->
<autoCommit>
<maxDocs>10000</maxDocs> <!-- maximum uncommited docs before autocommit triggered -->
<maxTime>86000</maxTime> <!-- maximum time (in MS) after adding a doc before an autocommit is triggered -->
</autoCommit>
l 与更新相关的事件监听器("Update" Related Event Listeners)
为与特殊更新相关的事件("postCommit" 和 "postOptimize".)指定监听器。监听器能触发任意的特殊代码,它们的典型应用是快照功能。
...
<!-- The RunExecutableListener executes an external command.
exe - the name of the executable to run
dir - dir to use as the current working directory. default="."
wait - the calling thread waits until the executable returns.
default="true"
args - the arguments to pass to the program. default=nothing
env - environment variables to set. default=nothing
-->
<!-- A postCommit event is fired after every commit
-->
<listener event="postCommit" class="solr.RunExecutableListener">
<str name="exe">snapshooter</str>
<str name="dir">solr/bin</str>
<bool name="wait">true</bool>
<!--
<arr name="args"> <str>arg1</str> <str>arg2</str> </arr>
<arr name="env"> <str>MYVAR=val1</str> </arr>
-->
</listener>
</updateHandler>
l 查询参数部分(The Query Section)
控制与查询相关的一切。
<query>
<!-- Maximum number of clauses in a boolean query... can affect range
or wildcard queries that expand to big boolean queries.
An exception is thrown if exceeded.
-->
<maxBooleanClauses>1024</maxBooleanClauses>
l 缓存参数部分(Caching Section)
当你的索引量增加或变化的时候,你需要在这里进行配置。关于缓存配置的更多细节请点这里。
<!-- Cache used by SolrIndexSearcher for filters (DocSets),
unordered sets of *all* documents that match a query.
When a new searcher is opened, its caches may be prepopulated
or "autowarmed" using data from caches in the old searcher.
autowarmCount is the number of items to prepopulate. For LRUCache,
the autowarmed items will be the most recently accessed items.
Parameters:
class - the SolrCache implementation (currently only LRUCache)
size - the maximum number of entries in the cache
initialSize - the initial capacity (number of entries) of
the cache. (seel java.util.HashMap)
autowarmCount - the number of entries to prepopulate from
and old cache.
-->
<filterCache
class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="256"/>
<!-- queryResultCache caches results of searches - ordered lists of
document ids (DocList) based on a query, a sort, and the range
of documents requested. -->
<queryResultCache
class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="256"/>
<!-- documentCache caches Lucene Document objects (the stored fields for each document).
Since Lucene internal document ids are transient, this cache will not be autowarmed. -->
<documentCache
class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<!-- Example of a generic cache. These caches may be accessed by name
through SolrIndexSearcher.getCache().cacheLookup(), and cacheInsert().
The purpose is to enable easy caching of user/application level data.
The regenerator argument should be specified as an implementation
of solr.search.CacheRegenerator if autowarming is desired. -->
<!--
<cache name="myUserCache"
class="solr.LRUCache"
size="4096"
initialSize="1024"
autowarmCount="1024"
regenerator="org.mycompany.mypackage.MyRegenerator"
/>
-->
<!-- An optimization that attempts to use a filter to satisfy a search.
If the requested sort does not include a score, then the filterCache
will be checked for a filter matching the query. If found, the filter
will be used as the source of document ids, and then the sort will be
applied to that.
-->
<useFilterForSortedQuery>true</useFilterForSortedQuery>
<!-- An optimization for use with the queryResultCache. When a search
is requested, a superset of the requested number of document ids
are collected. For example, of a search for a particular query
requests matching documents 10 through 19, and queryWindowSize is 50,
then documents 0 through 50 will be collected and cached. Any further
requests in that range can be satisfied via the cache.
-->
<queryResultWindowSize>50</queryResultWindowSize>
<!-- This entry enables an int hash representation for filters (DocSets)
when the number of items in the set is less than maxSize. For smaller
sets, this representation is more memory efficient, more efficient to
iterate over, and faster to take intersections.
-->
<HashDocSet maxSize="3000" loadFactor="0.75"/>
<!-- boolToFilterOptimizer converts boolean clauses with zero boost
cached filters if the number of docs selected by the clause exceeds the
threshold (represented as a fraction of the total index)
-->
<boolTofilterOptimizer enabled="true" cacheSize="32" threshold=".05"/>
<!-- Lazy field loading will attempt to read only parts of documents on disk that are
requested. Enabling should be faster if you aren't retrieving all stored fields.
-->
<enableLazyFieldLoading>false</enableLazyFieldLoading>
l 查询相关的事件监听器参数配置("Query" Related Event Listeners)
在这里定义与特殊查询相关的事件监听器,使用该监听器实现需要的代码,例如启动常用的查询去预热缓存。
【newSearcher】 在有注册搜索器存在的时启动一个新的搜索器,下例中的监听器就是这类,它获得查询列表并将它们发送到新的搜索器以达到预热的目的。
<!-- a newSearcher event is fired whenever a new searcher is being
prepared and there is a current searcher handling requests
(aka registered).
-->
<!-- QuerySenderListener takes an array of NamedList and
executes a local query request for each NamedList in sequence.
-->
<!--
<listener event="newSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<lst> <str name="q">solr</str>
<str name="start">0</str>
<str name="rows">10</str>
</lst>
<lst> <str name="q">rocks</str>
<str name="start">0</str>
<str name="rows">10</str>
</lst>
</arr>
-->
【firstSearcher】
当不存在已注册的搜索器时启动新的firstSearcher。下例正式如此,该监听器获得查询列表将其发送到正启动的新的搜索器,将其预热。(注意,只有当存在已注册搜索器的时候才可以使用自动预热auto-warming)
<!-- a firstSearcher event is fired whenever a new searcher is being
prepared but there is no current registered searcher to handle
requests or to gain prewarming data from.
-->
<!--
<listener event="firstSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<lst> <str name="q">fast_warm</str>
<str name="start">0</str>
<str name="rows">10</str>
</lst>
</arr>
</listener>