如何分析 Linux 日志
日志中有大量的信息需要你处理,尽管有时候想要提取并非想象中的容易。在这篇文章中我们会介绍一些你现在就能做的基本日志分析例子(只需要搜索即可)。我们还将涉及一些更高级的分析,但这些需要你前期努力做出适当的设置,后期就能节省很多时间。对数据进行高级分析的例子包括生成汇总计数、对有效值进行过滤,等等。
我们首先会向你展示如何在命令行中使用多个不同的工具,然后展示了一个日志管理工具如何能自动完成大部分繁重工作从而使得日志分析变得简单。
用 Grep 搜索
搜索文本是查找信息最基本的方式。搜索文本最常用的工具是 grep。这个命令行工具在大部分 Linux 发行版中都有,它允许你用正则表达式搜索日志。正则表达式是一种用特殊的语言写的、能识别匹配文本的模式。最简单的模式就是用引号把你想要查找的字符串括起来。
正则表达式
这是一个在 Ubuntu 系统的认证日志中查找 “user hoover” 的例子:
<span class="pln">$ grep </span><span class="str">"user hoover"</span><span class="pun">/</span><span class="kwd">var</span><span class="pun">/</span><span class="pln">log</span><span class="pun">/</span><span class="pln">auth</span><span class="pun">.</span><span class="pln">log</span>
<span class="typ">Accepted</span><span class="pln"> password </span><span class="kwd">for</span><span class="pln"> hoover </span><span class="kwd">from</span><span class="lit">10.0</span><span class="pun">.</span><span class="lit">2.2</span><span class="pln"> port </span><span class="lit">4792</span><span class="pln"> ssh2</span>
<span class="pln">pam_unix</span><span class="pun">(</span><span class="pln">sshd</span><span class="pun">:</span><span class="pln">session</span><span class="pun">):</span><span class="pln"> session opened </span><span class="kwd">for</span><span class="pln"> user hoover </span><span class="kwd">by</span><span class="pun">(</span><span class="pln">uid</span><span class="pun">=</span><span class="lit">0</span><span class="pun">)</span>
<span class="pln">pam_unix</span><span class="pun">(</span><span class="pln">sshd</span><span class="pun">:</span><span class="pln">session</span><span class="pun">):</span><span class="pln"> session closed </span><span class="kwd">for</span><span class="pln"> user hoover</span>
构建精确的正则表达式可能很难。例如,如果我们想要搜索一个类似端口 “4792” 的数字,它可能也会匹配时间戳、URL 以及其它不需要的数据。Ubuntu 中下面的例子,它匹配了一个我们不想要的 Apache 日志。
<span class="pln">$ grep </span><span class="str">"4792"</span><span class="pun">/</span><span class="kwd">var</span><span class="pun">/</span><span class="pln">log</span><span class="pun">/</span><span class="pln">auth</span><span class="pun">.</span><span class="pln">log</span>
<span class="typ">Accepted</span><span class="pln"> password </span><span class="kwd">for</span><span class="pln"> hoover </span><span class="kwd">from</span><span class="lit">10.0</span><span class="pun">.</span><span class="lit">2.2</span><span class="pln"> port </span><span class="lit">4792</span><span class="pln"> ssh2</span>
<span class="lit">74.91</span><span class="pun">.</span><span class="lit">21.46</span><span class="pun">-</span><span class="pun">-</span><span class="pun">[</span><span class="lit">31</span><span class="pun">/</span><span class="typ">Mar</span><span class="pun">/</span><span class="lit">2015</span><span class="pun">:</span><span class="lit">19</span><span class="pun">:</span><span class="lit">44</span><span class="pun">:</span><span class="lit">32</span><span class="pun">+</span><span class="lit">0000</span><span class="pun">]</span><span class="str">"GET /scripts/samples/search?q=4972 HTTP/1.0"</span><span class="lit">404</span><span class="lit">545</span><span class="str">"-"</span><span class="str">"-”</span>
环绕搜索
另一个有用的小技巧是你可以用 grep 做环绕搜索。这会向你展示一个匹配前面或后面几行是什么。它能帮助你调试导致错误或问题的东西。B
选项展示前面几行,A
选项展示后面几行。举个例子,我们知道当一个人以管理员员身份登录失败时,同时他们的 IP 也没有反向解析,也就意味着他们可能没有有效的域名。这非常可疑!
<span class="pln">$ grep </span><span class="pun">-</span><span class="pln">B </span><span class="lit">3</span><span class="pun">-</span><span class="pln">A </span><span class="lit">2</span><span class="str">'Invalid user'</span><span class="pun">/</span><span class="kwd">var</span><span class="pun">/</span><span class="pln">log</span><span class="pun">/</span><span class="pln">auth</span><span class="pun">.</span><span class="pln">log</span>
<span class="typ">Apr</span><span class="lit">28</span><span class="lit">17</span><span class="pun">:</span><span class="lit">06</span><span class="pun">:</span><span class="lit">20</span><span class="pln"> ip</span><span class="pun">-</span><span class="lit">172</span><span class="pun">-</span><span class="lit">31</span><span class="pun">-</span><span class="lit">11</span><span class="pun">-</span><span class="lit">241</span><span class="pln"> sshd</span><span class="pun">[</span><span class="lit">12545</span><span class="pun">]:</span><span class="pln"> reverse mapping checking getaddrinfo </span><span class="kwd">for</span><span class="lit">216</span><span class="pun">-</span><span class="lit">19</span><span class="pun">-</span><span class="lit">2</span><span class="pun">-</span><span class="lit">8.commspeed</span><span class="pun">.</span><span class="pln">net </span><span class="pun">[</span><span class="lit">216.19</span><span class="pun">.</span><span class="lit">2.8</span><span class="pun">]</span><span class="pln"> failed </span><span class="pun">-</span><span class="pln"> POSSIBLE BREAK</span><span class="pun">-</span><span class="pln">IN ATTEMPT</span><span class="pun">!</span>
<span class="typ">Apr</span><span class="lit">28</span><span class="lit">17</span><span class="pun">:</span><span class="lit">06</span><span class="pun">:</span><span class="lit">20</span><span class="pln"> ip</span><span class="pun">-</span><span class="lit">172</span><span class="pun">-</span><span class="lit">31</span><span class="pun">-</span><span class="lit">11</span><span class="pun">-</span><span class="lit">241</span><span class="pln"> sshd</span><span class="pun">[</span><span class="lit">12545</span><span class="pun">]:</span><span class="typ">Received</span><span class="pln"> disconnect </span><span class="kwd">from</span><span class="lit">216.19</span><span class="pun">.</span><span class="lit">2.8</span><span class="pun">:</span><span class="lit">11</span><span class="pun">:</span><span class="typ">Bye</span><span class="typ">Bye</span><span class="pun">[</span><span class="pln">preauth</span><span class="pun">]</span>
<span class="typ">Apr</span><span class="lit">28</span><span class="lit">17</span><span class="pun">:</span><span class="lit">06</span><span class="pun">:</span><span class="lit">20</span><span class="pln"> ip</span><span class="pun">-</span><span class="lit">172</span><span class="pun">-</span><span class="lit">31</span><span class="pun">-</span><span class="lit">11</span><span class="pun">-</span><span class="lit">241</span><span class="pln"> sshd</span><span class="pun">[</span><span class="lit">12547</span><span class="pun">]:</span><span class="typ">Invalid</span><span class="pln"> user admin </span><span class="kwd">from</span><span class="lit">216.19</span><span class="pun">.</span><span class="lit">2.8</span>
<span class="typ">Apr</span><span class="lit">28</span><span class="lit">17</span><span class="pun">:</span><span class="lit">06</span><span class="pun">:</span><span class="lit">20</span><span class="pln"> ip</span><span class="pun">-</span><span class="lit">172</span><span class="pun">-</span><span class="lit">31</span><span class="pun">-</span><span class="lit">11</span><span class="pun">-</span><span class="lit">241</span><span class="pln"> sshd</span><span class="pun">[</span><span class="lit">12547</span><span class="pun">]:</span><span class="pln"> input_userauth_request</span><span class="pun">:</span><span class="pln"> invalid user admin </span><span class="pun">[</span><span class="pln">preauth</span><span class="pun">]</span>
<span class="typ">Apr</span><span class="lit">28</span><span class="lit">17</span><span class="pun">:</span><span class="lit">06</span><span class="pun">:</span><span class="lit">20</span><span class="pln"> ip</span><span class="pun">-</span><span class="lit">172</span><span class="pun">-</span><span class="lit">31</span><span class="pun">-</span><span class="lit">11</span><span class="pun">-</span><span class="lit">241</span><span class="pln"> sshd</span><span class="pun">[</span><span class="lit">12547</span><span class="pun">]:</span><span class="typ">Received</span><span class="pln"> disconnect </span><span class="kwd">from</span><span class="lit">216.19</span><span class="pun">.</span><span class="lit">2.8</span><span class="pun">:</span><span class="lit">11</span><span class="pun">:</span><span class="typ">Bye</span><span class="typ">Bye</span><span class="pun">[</span><span class="pln">preauth</span><span class="pun">]</span>
Tail
你也可以把 grep 和 tail 结合使用来获取一个文件的最后几行,或者跟踪日志并实时打印。这在你做交互式更改的时候非常有用,例如启动服务器或者测试代码更改��
<span class="pln">$ tail </span><span class="pun">-</span><span class="pln">f </span><span class="pun">/</span><span class="kwd">var</span><span class="pun">/</span><span class="pln">log</span><span class="pun">/</span><span class="pln">auth</span><span class="pun">.</span><span class="pln">log </span><span class="pun">|</span><span class="pln"> grep </span><span class="str">'Invalid user'</span>
<span class="typ">Apr</span><span class="lit">30</span><span class="lit">19</span><span class="pun">:</span><span class="lit">49</span><span class="pun">:</span><span class="lit">48</span><span class="pln"> ip</span><span class="pun">-</span><span class="lit">172</span><span class="pun">-</span><span class="lit">31</span><span class="pun">-</span><span class="lit">11</span><span class="pun">-</span><span class="lit">241</span><span class="pln"> sshd</span><span class="pun">[</span><span class="lit">6512</span><span class="pun">]:</span><span class="typ">Invalid</span><span class="pln"> user ubnt </span><span class="kwd">from</span><span class="lit">219.140</span><span class="pun">.</span><span class="lit">64.136</span>
<span class="typ">Apr</span><span class="lit">30</span><span class="lit">19</span><span class="pun">:</span><span class="lit">49</span><span class="pun">:</span><span class="lit">49</span><span class="pln"> ip</span><span class="pun">-</span><span class="lit">172</span><span class="pun">-</span><span class="lit">31</span><span class="pun">-</span><span class="lit">11</span><span class="pun">-</span><span class="lit">241</span><span class="pln"> sshd</span><span class="pun">[</span><span class="lit">6514</span><span class="pun">]:</span><span class="typ">Invalid</span><span class="pln"> user admin </span><span class="kwd">from</span><span class="lit">219.140</span><span class="pun">.</span><span class="lit">64.136</span>
关于 grep 和正则表达式的详细介绍并不在本指南的范围,但 Ryan’s Tutorials 有更深入的介绍。
日志管理系统有更高的性能和更强大的搜索能力。它们通常会索引数据并进行并行查询,因此你可以很快的在几秒内就能搜索 GB 或 TB 的日志。相比之下,grep 就需要几分钟,在极端情况下可能甚至几小时。日志管理系统也使用类似 Lucene 的查询语言,它提供更简单的语法来检索数字、域以及其它。
用 Cut、 AWK、 和 Grok 解析
命令行工具
Linux 提供了多个命令行工具用于文本解析和分析。当你想要快速解析少量数据时非常有用,但处理大量数据时可能需要很长时间。
Cut
cut 命令允许你从有分隔符的日志解析字段。分隔符是指能分开字段或键值对的等号或逗号等。
假设我们想从下面的日志中解析出用户:
<span class="pln">pam_unix</span><span class="pun">(</span><span class="pln">su</span><span class="pun">:</span><span class="pln">auth</span><span class="pun">):</span><span class="pln"> authentication failure</span><span class="pun">;</span><span class="pln"> logname</span><span class="pun">=</span><span class="pln">hoover uid</span><span class="pun">=</span><span class="lit">1000</span><span class="pln"> euid</span><span class="pun">=</span><span class="lit">0</span><span class="pln"> tty</span><span class="pun">=</span><span class="str">/dev/</span><span class="pln">pts</span><span class="pun">/</span><span class="lit">0</span><span class="pln"> ruser</span><span class="pun">=</span><span class="pln">hoover rhost</span><span class="pun">=</span><span class="pln"> user</span><span class="pun">=</span><span class="pln">root</span>
我们可以像下面这样用 cut 命令获取用等号分割后的第八个字段的文本。这是一个 Ubuntu 系统上的例子:
<span class="pln">$ grep </span><span class="str">"authentication failure"</span><span class="pun">/</span><span class="kwd">var</span><span class="pun">/</span><span class="pln">log</span><span class="pun">/</span><span class="pln">auth</span><span class="pun">.</span><span class="pln">log </span><span class="pun">|</span><span class="pln"> cut </span><span class="pun">-</span><span class="pln">d </span><span class="str">'='</span><span class="pun">-</span><span class="pln">f </span><span class="lit">8</span>
<span class="pln">root</span>
<span class="pln">hoover</span>
<span class="pln">root</span>
<span class="pln">nagios</span>
<span class="pln">nagios</span>
AWK
另外,你也可以使用 awk,它能提供更强大的解析字段功能。它提供了一个脚本语言,你可以过滤出几乎任何不相干的东西。
例如,假设在 Ubuntu 系统中我们有下面的一行日志,我们想要提取登录失败的用户名称:
<span class="typ">Mar</span><span class="lit">24</span><span class="lit">08</span><span class="pun">:</span><span class="lit">28</span><span class="pun">:</span><span class="lit">18</span><span class="pln"> ip</span><span class="pun">-</span><span class="lit">172</span><span class="pun">-</span><span class="lit">31</span><span class="pun">-</span><span class="lit">11</span><span class="pun">-</span><span class="lit">241</span><span class="pln"> sshd</span><span class="pun">[</span><span class="lit">32701</span><span class="pun">]:</span><span class="pln"> input_userauth_request</span><span class="pun">:</span><span class="pln"> invalid user guest </span><span class="pun">[</span><span class="pln">preauth</span><span class="pun">]</span>
你可以像下面这样使用 awk 命令。首先,用一个正则表达式 /sshd.*invalid user/ 来匹配 sshd invalid user 行。然后用 { print $9 } 根据默认的分隔符空格打印第九个字段。这样就输出了用户名。
<span class="pln">$ awk </span><span class="str">'/sshd.*invalid user/ { print $9 }'</span><span class="pun">/</span><span class="kwd">var</span><span class="pun">/</span><span class="pln">log</span><span class="pun">/</span><span class="pln">auth</span><span class="pun">.</span><span class="pln">log</span>
<span class="pln">guest</span>
<span class="pln">admin</span>
<span class="pln">info</span>
<span class="pln">test</span>
<span class="pln">ubnt</span>
你可以在 Awk 用户指南 中阅读更多关于如何使用正则表达式和输出字段的信息。
日志管理系统
日志管理系统使得解析变得更加简单,使用户能快速的分析很多的日志文件。他们能自动解析标准的日志格式,比如常见的 Linux 日志和 Web 服务器日志。这能节省很多时间,因为当处理系统问题的时候你不需要考虑自己写解析逻辑。
下面是一个 sshd 日志消息的例子,解析出了每个 remoteHost 和 user。这是 Loggly 中的一张截图,它是一个基于云的日志管理服务。
你也可以对非标准格式自定义解析。一个常用的工具是 Grok,它用一个常见正则表达式库,可以解析原始文本为结构化 JSON。下面是一个 Grok 在 Logstash 中解析内核日志文件的事例配置:
<span class="pln">filter</span><span class="pun">{</span>
<span class="pln">grok </span><span class="pun">{</span>
<span class="pln">match </span><span class="pun">=></span><span class="pun">{</span><span class="str">"message"</span><span class="pun">=></span><span class="str">"%{CISCOTIMESTAMP:timestamp} %{HOST:host} %{WORD:program}%{NOTSPACE} %{NOTSPACE}%{NUMBER:duration}%{NOTSPACE} %{GREEDYDATA:kernel_logs}"</span>
<span class="pun">}</span>
<span class="pun">}</span>
下图是 Grok 解析后输出的结果: