Solr 在mmseg4j中使用中文停止词（的、地、得）

wsxsxz

2015-02-13

关注关注

1. 从附件中下载中文停止词，如果有需要，你也可以下载英文停止词，操作步骤一样。

2. 可以在Tomcat的Solr 中的某个位置创建一个文件，比如叫stopwords.txt，此处我直接将stopwords.txt放在core目录下，即和schema.xml在同一个目录中。

3.将上面下载的中英文停止词加入到stopwords.txt中

4.如果你还不懂怎么在solr中配置mmseg4j，可参照我之前的博客。

5.修改schema.xml，

1）确保schema.xml中有这样的配置：

2) 在mmseg4j相应的field定义中加入该过滤器

<!-- mmseg4j-->
    <fieldType name="text_mmseg4j_complex"class="solr.TextField"positionIncrementGap="100" >
        <analyzer>
            <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory"mode="complex"dicPath="dic"/>
 
            <filter class="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt" />
        </analyzer>
    </fieldType>
    <fieldType name="text_mmseg4j_maxword"class="solr.TextField"positionIncrementGap="100" >
        <analyzer>
            <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory"mode="max-word"dicPath="dic"/>
 
        <filter class="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt" />
 
        </analyzer>
    </fieldType>
    <fieldType name="text_mmseg4j_simple"class="solr.TextField"positionIncrementGap="100" >
        <analyzer>
          <!--  
            <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="simple" dicPath="n:/OpenSource/apache-solr-1.3.0/example/solr/my_dic"/>   
            -->
            <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory"mode="simple"dicPath="dic"/>
 
        <filter class="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt" />
        </analyzer>
    </fieldType>
 
    <!-- mmseg4j-->

3）重启tomcat

测试你会发现相应的停止词如的、地、得等已经不会再影响分词结果。

wsxsxz

0 关注 0 粉丝 0 动态

关注关注

安科网

Solr 在mmseg4j中使用中文停止词（的、地、得）

wsxsxz

wsxsxz

wsxsxz