淺談篩選日志中的IP地址信息

Net21-冰凍vs西瓜 ? 2016-07-22 10:09 ? 系統運維

作為運維人員，經常會需要會對日志中的某些重要信息進行篩選，比如說ip等參數。

案例一：篩選出IP地址信息

日志信息如下：

[root@C67-X64-A1 hanghang]# cat test.txt 
Jul 13 08:13:09 localhost sshd[14678]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 
Jul 13 08:13:09 localhost sshd[14679]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=222.73.173.143 user=root
Jul 13 08:13:11 localhost sshd[14691]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 user=admin
Jul 13 08:13:11 localhost sshd[14692]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=222.73.173.143 
Jul 13 08:13:14 localhost sshd[14707]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 
Jul 13 08:13:14 localhost sshd[14711]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=222.73.173.143 user=root
Jul 13 08:13:17 localhost sshd[14722]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 
Jul 13 08:13:17 localhost sshd[14724]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=222.73.173.143 
Jul 13 08:13:20 localhost sshd[14739]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 user=root
Jul 13 08:13:23 localhost sshd[14753]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 user=root
Jul 13 08:13:26 localhost sshd[14767]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 
Jul 13 08:13:29 localhost sshd[14781]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 
Jul 13 08:13:32 localhost sshd[14795]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 
Jul 13 08:13:35 localhost sshd[14809]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 
Jul 13 08:13:38 localhost sshd[14823]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 
Jul 13 08:13:41 localhost sshd[14837]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 user=apache
Jul 13 08:13:44 localhost sshd[14851]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 
Jul 13 08:13:47 localhost sshd[14865]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 
Jul 13 08:13:49 localhost sshd[14876]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 
Jul 13 08:13:53 localhost sshd[14895]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172

方法1：利用awk命令進行篩選

[root@C67-X64-A1 hanghang]# awk -F "rhost=" '{print $NF}' test.txt |awk '{print $1'}|sort -r|uniq
61.152.95.172
222.73.173.143

方法2：利用grep的擴展命令egrep進行篩選

[root@C67-X64-A1 hanghang]# egrep -o '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' test.txt |sort -r|uniq
61.152.95.172
222.73.173.143

方法3：利用sed命令進行篩選

[root@C67-X64-A1 hanghang]# sed -nr 's/.*[^0-9](([0-9]+\.){3}[0-9]+).*/\1/p' test.txt |sort -r|uniq
61.152.95.172
222.73.173.143
[root@C67-X64-A1 hanghang]# sed -nr 's/(^|.*[^0-9])(([0-9]+\.){3}[0-9]+).*/\2/p' test.txt |sort -r|uniq
61.152.95.172
222.73.173.143

案例二：根據需求對日志信息進行篩選

需求：

最近需要處理下網站日志：

例如

A 1.1.1.1 用戶訪問有index.html 和a.jpg 的日志

B 20.20.20.20 用戶訪問有index.html 的日志沒其他文件記錄的日志

現在需要提取B的IP 不需要A的IP

日志信息如下：

[root@C67-X64-A1 hanghang]# cat files 
1.1.1.1 - - [19/Jul/2013:15:01:39 +0800] "GET /index.html  HTTP/1.1
10.10.10.10 - - [19/Jul/2013:15:01:39 +0800] "GET /index.html  HTTP/1.1
10.10.10.10  - - [19/Jul/2013:15:01:39 +0800] "GET /logo.jpg  HTTP/1.1
10.10.10.10  - - [19/Jul/2013:15:01:39 +0800] "GET /a.js  HTTP/1.1
3.3.3.3 - - [19/Jul/2013:15:01:39 +0800] "GET /index.html  HTTP/1.1
20.20.20.20 - - [19/Jul/2013:15:01:39 +0800] "GET /index.html  HTTP/1.1
20.20.20.20  - - [19/Jul/2013:15:01:39 +0800] "GET /logo.jpg  HTTP/1.1
20.20.20.20  - - [19/Jul/2013:15:01:39 +0800] "GET /a.js  HTTP/1.1
30.30.30.30 - - [19/Jul/2013:15:01:39 +0800] "GET /index.html  HTTP/1.1
30.30.30.30  - - [19/Jul/2013:15:01:39 +0800] "GET /logo.jpg  HTTP/1.1
30.30.30.30  - - [19/Jul/2013:15:01:39 +0800] "GET /a.js  HTTP/1.1
4.4.4.4 - - [19/Jul/2013:15:01:39 +0800] "GET /index.html  HTTP/1.1
5.5.5.5 - - [19/Jul/2013:15:01:39 +0800] "GET /index.html  HTTP/1.1
1.1.1.1 - - [20/Jul/2013:15:01:39 +0800] "GET /index.html  HTTP/1.1
2.2.2.2 - - [21/Jul/2013:15:01:39 +0800] "GET /index.html  HTTP/1.1
3.3.3.3 - - [21/Jul/2013:15:01:55 +0800] "GET /index.html  HTTP/1.1
4.4.4.4 - - [21/Jul/2013:16:01:55 +0800] "GET /index.html  HTTP/1.1
5.5.5.5 - - [21/Jul/2013:17:02:55 +0800] "GET /index.html  HTTP/1.1

Shell腳本實現：

#!/bin/bash
#author molewan
for i in `grep -v "/index.html"  files  | awk '{print $1}' | uniq`;do
    echo "| grep -v "$i" " >> tmp_title
done
M=`cat tmp_title | tr "\n" " " | sed 's#^#cat files | sort -r | uniq#'`
echo $M | bash | awk '{print $1}'
rm -rf tmp_title

Python腳本實現：

假設日志信息是放在文件log.dat里面的：

#! /usr/bin/env python                                                                
import re                     
Dip_reso = {}  
pattern = re.compile('(\d+\.\d+\.\d+\.\d+).*GET /(.*) .*')             
f = open('log.dat')
               
for line in f: 
    resource = re.match(pattern, line)                                 
    key = resource.group(1)                                            
    value = resource.group(2)                                          
    if key in Dip_reso:
        if value not in Dip_reso[key]:                                                
            Dip_reso[key].append(value)    
        else:
            continue           
    else:      
        Dip_reso[key] = []                                             
        Dip_reso[key].append(value)                                    
f.close()      
               
for k in Dip_reso:                                                     
    if len(Dip_reso[k]) == 1 and  cmp(Dip_reso[k][0], 'index.html') == 1:
        print k

#如果你要搜集數據，可以這樣

# ip_data = [ip for ip in Dip_reso if len(Dip_reso[ip]) == 1 and cmp(Dip_reso[ip][0], 'index.html') == 1]

這樣，ip_data就是所有的ip了。

原創文章，作者：Net21-冰凍vs西瓜，如若轉載，請注明出處：http://www.www58058.com/24749

贊 (1)

Net21-冰凍vs西瓜

0

馬哥網絡班第21期-第二周作業毛豆

上一篇 2016-07-22 10:09

馬哥網絡教育班第21期+第四周課程練習

下一篇 2016-07-22 10:10

Bash Shell編程初學基礎篇之一

Bash Shell編程初學基礎篇之一說明：本文僅供初學Linux Bash shell學員參考學習，大神們如有興趣請批評指正?。?！相信對于很多Linux初學者或者僅僅是聽說Linux還沒有接觸過的同學會有一種神秘感或者不敢碰觸的感覺，今天就幫大家揭開它的神秘面紗，其實并沒有那么深不可測，只…

Linux干貨 2015-03-29
PHP5.4的變化關注—What has changed in PHP 5.4.x

What has changed in PHP 5.4.x Most improvements in PHP 5.4.x have no impact on existing code. There are a few incompatibilities and new features that should be …

Linux干貨 2015-06-17
C語言結構體里的成員數組和指針

單看這文章的標題，你可能會覺得好像沒什么意思。你先別下這個結論，相信這篇文章會對你理解C語言有幫助。這篇文章產生的背景是在微博上，看到@Laruence同學出了一個關于C語言的題，微博鏈接。微博截圖如下。我覺得好多人對這段代碼的理解還不夠深入，所以寫下了這篇文章。為了方便你把代碼copy過去編譯和調試，我把代碼列在下面： final void&n…

Linux干貨 2016-05-29
Linux網絡管理之網卡別名及網卡綁定配置

在日常的運維工作當中，有時候需要在一塊物理網卡上配置多個IP地址，這就是網卡子接口的概念，以及多塊網卡實現網卡的綁定，通俗來講就是多塊網卡使用的一個IP地址，下面我來詳細說明實現的過程。＆創建網卡子接口在CentOS系統當中網絡是由NetworkManager這個服務來管理的，它提供了一個圖形的界面，但此服務不支持物理網卡子接口的設置，所以在配置網卡子接…

系統運維 2016-09-02
Linux 網絡屬性管理

在介紹Linux網路屬性管理之前，我還要介紹下以太網（Enternet）以太網的工作原理以太網才用帶沖突檢測的載波偵聽多路訪問（CSMA/CD）機制。以太網中節點都可以看到在網絡中發送的所有信息，因此，我們說以太網是一種廣播網絡。以太網的工作過程如下：當以太網中的一臺主機要傳輸數據時，它將按如下步驟進行： 1.監聽信道上是否有信號在傳輸。如果有的話，…

Linux干貨 2016-03-20
linux特殊權限管理

特殊權限：SUID， SGID， STICKY 正常情況下： 1、進程以某用戶的身份運行；進程是發起此進程用戶的代理，因此以此用戶的身份和權限完成所有操作； 2、權限匹配模型： (1) 判斷進程的屬主，是否為被訪問的文件屬主；如果是，則應用屬主的權限；否則進入第2步； (2) 判斷進程的屬主，是…

Linux干貨 2015-12-19

欧美性久久久久