指定Elasticsearch 的 Analyzer
安装好ELK后,默认的elasticsearch用的分词器为standard analyzer,所以我们的异常“org.springframework.jdbc.BadSqlGrammarException”不能通过BadSqlGrammarException搜索到。
以“one.two.three.+four”为例子,如果用standard analyzer,只有两个term,用simple将有4个term
https://discuss.elastic.co/t/dot-analyzer/3635/2
default analyzer,即standard analyzer
http://localhost:9200/twitter/_analyze?text=one.two.three.+four&pretty=1'
{
"tokens" : [ {
"token" : "one.two.three",
"start_offset" : 0,
"end_offset" : 14,
"type" : "",
"position" : 1
}, {
"token" : "four",
"start_offset" : 15,
"end_offset" : 19,
"type" : "",
"position" : 2
} ]
}
改用simple analyzer,有4个term被分出来:
http://localhost:9200/twitter/_analyze?analyzer=simple&text=one.two.three.+four&pretty=1
'
{
"tokens" : [ {
"token" : "one",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
}, {
"token" : "two",
"start_offset" : 4,
"end_offset" : 7,
"type" : "word",
"position" : 2
}, {
"token" : "three",
"start_offset" : 8,
"end_offset" : 13,
"type" : "word",
"position" : 3
}, {
"token" : "four",
"start_offset" : 15,
"end_offset" : 19,
"type" : "word",
"position" : 4
} ]
}
分词器可以为每个query指定,每个field或者每个index。refer to :https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer.html
es选择分词器的顺序为:
索引阶段
-An analyzer named default in the index settings.
-The standard analyzer.
查询阶段
-The search_analyzer defined in the field mapping.
-The analyzer defined in the field mapping.
-An analyzer named default_search in the index settings.
-An analyzer named default in the index settings.
-The standard analyzer.
我们设置logstash过来的数据对message field指定为simple analyzer:
PUT _template/logstash { "template" : "logstash-*", "mappings": { "test": { "properties": { "message": { "type": "text", "analyzer": "simple" } } } } }
创建一个名为logstash的template,它应用于所有名为logstash-*的index,为这个template建了一个名为test的mapping,该mapping下的message filed为文本类型,使用的analyze为simple。
OK,更改了名为logstash-*的index的analyzer为simple analyzer。测试一下:
1)在logstash监听的log中增加一条数据:
2)看到elasticsearch的console打出一行日志,上面这条数据已经被索引,并使用了我们定义的template logstash,以及我们的mapping test:
*关于 shards [5]/[1]:
By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.
refer to:https://www.elastic.co/guide/en/elasticsearch/reference/current/_basic_concepts.html#getting-started-shards-and-replicas
3)测试通过“ jdbc”在message field来搜索:
GET /_search { "query": { "query_string": { "query": "jdbc", "fields": [ "message" ] } } }
可以看到搜索成功:
{ "took": 0, "timed_out": false, "_shards": { "total": 6, "successful": 6, "failed": 0 }, "hits": { "total": 1, "max_score": 1.1290016, "hits": [ { "_index": "logstash-2017.07.06", "_type": "testlogs", "_id": "AV0YmlmqLzl7sqCPLrgd", "_score": 0.28582606, "_source": { "path": "/Users/jo/lp_logs/error.log", "@timestamp": "2017-07-06T15:52:34.975Z", "@version": "1", "host": "Zhuos-MacBook-Pro.local", "message": "\torg.springframework.jdbc.BadSqlGrammarException: ### Error querying database.", "type": "testlogs" } } ] } }
在Kibana中搜索message:jdbc即可获得结果。
但是比较奇怪的现象是,必须指定message域才能查出来。而message中的一些其他term,比如“Error”就可以不指定message直接查出来。
相关推荐
另外一部分,则需要先做聚类、分类处理,将聚合出的分类结果存入ES集群的聚类索引中。数据处理层的聚合结果存入ES中的指定索引,同时将每个聚合主题相关的数据存入每个document下面的某个field下。