Linux系統文本處理

Linux系統文本處理工具

簡介：

各種文本工具來查看、分析、統計文本文件；

grep：按關鍵字查找內容；

正則表達式

擴展正則表達式

Sed

文件截取查看：head 和 tail

1、cat命令：文件查看

cat [OPTION]… [FILE]…

-E：顯示行結束符$；

-n：對顯示出的每一行進行編號；

-A：顯示所有控制符；

-b：顯示出行號，空行除外；

-s：壓縮；連續的空行成一行；

2、分頁查看文件內容：

more分頁查看文件：

more [-dlfpcsu] [-num] [+/pattern] [+linenum] [file …]

-d選項：提示space鍵翻頁和退出提示；

按enter鍵向下逐行滾動查看；

按空格鍵可以向下翻一屏；

按b鍵向上翻一屏；

按q鍵退出并返回到原來的命令環境；

！Command可以直接執行命令，不用退出；

less 查看文件或stdin輸出

按enter鍵向下逐行滾動查看；

按空格鍵可以向下翻一屏；

按b鍵向上翻一屏；

按q鍵退出并返回到原來的命令環境；

！Command可以直接執行命令，不用退出；

/ 文件內容查找；n向下查找，N向上查找；

注：man命令就是使用less命令分頁查看器；

3、顯示文本前后行內容：

head命令：查看文件開頭行內容；

head [OPTION]… [FILE]…

-c#：指定獲取前#字節；

-n#：指定獲取前#行；

-#：指定行數；

默認顯示前10行；

tail命令：查看文件末尾行內容：

-c#：指定獲取后#字節；

-n#：指定獲取后#行；

-#：指定行數；

默認顯示后10行；

-f：用于跟蹤日志文件末尾的內容變化；

logger：觸發日志生成：

例：

logger “this is a test log” （這樣就會生成日志）

只查看最新一條日志，不影響正常工作，放置在后臺工作：

[root@centos6 ~]# tail -n 0 -f /var/log/messages &

[1] 4236

[root@centos6 ~]# logger "this is a test log"

Aug 7 12:19:09 centos6 root: this is a test log

[root@centos6 ~]#

fg命令：把后臺執行的命令調到前臺工作，ctrl+c可以終止命令運行；

4、cut按例抽取文本內容：

cut OPTION… [FILE]…

-d：指明分隔符，默認是tab；

-c：按字符切割；

-f：指明要切割的列：

#：第#列;

#,#,#：離散的多列，例如1,3,6；

#-#：連續的多列，例如 1-6；

混合使用：1-3，5,7；

–output-delimter=sting：指定輸出分隔符；

例：

[root@centos6 testdir]# cat passwd | tail -n 3 | cut -d: -f 1,7

pulse:/sbin/nologin

sshd:/sbin/nologin

tcpdump:/sbin/nologin

[root@centos6 testdir]#

按字符切割：

[root@centos6 testdir]# cat passwd | tail -n 1 | cut -c 1-7

tcpdump

[root@centos6 testdir]#

按字符切割取出磁盤使用率列：

[root@centos6 testdir]# df | cut -c 44-46

Use

[root@centos6 testdir]#

或

[root@centos6 testdir]# df | tr -s " " | tr -t " " ":"|cut -d: -f 5 |tr -d "%"

Use

[root@centos6 testdir]#

取出ipconfig中的IP地址：

[root@centos6 testdir]# ifconfig |head -2 | cut -d: -f 2 | tr -d "[[:alpha:]]" | tail -1

192.168.3.3

[root@centos6 testdir]#

5、paste合并兩個文件同行號的列到一行：

paste [OPTION]… [FILE]…

-d：指定分隔符，默認tab；

-s：所有行合并成一行顯示；

例：

paste命令默認：

[root@centos6 testdir]# paste aa f1

CentOS release 6.8 (Final) CentOS release 6.8 (Final)

Kernel \r on an \m Kernel \r on an \m

[root@centos6 testdir]#

paste命令：結合-d選項：

[root@centos6 testdir]# paste -d: aa f1

CentOS release 6.8 (Final):CentOS release 6.8 (Final)

Kernel \r on an \m:Kernel \r on an \m

[root@centos6 testdir]#

paste命令：結合-s選項

[root@centos6 testdir]# paste -s aa f1

CentOS release 6.8 (Final) Kernel \r on an \m

[root@centos6 testdir]#

6、wc命令：統計文本數據：

wc [OPTION]… [FILE]…

wc [OPTION]… –files0-from=F

-l：統計文件的行數；

-w：統計文件的單詞數；

-c：統計文件的字節數；

-m：統計文件的字符數；

注：默認不加選項時統計文件的行數、單詞數、字符數；

例：

[root@centos6 testdir]# cat aa

CentOS release 6.8 (Final)

Kernel \r on an \m

[root@centos6 testdir]# wc -l aa

3 aa

[root@centos6 testdir]# wc -w aa

9 aa

[root@centos6 testdir]# wc -c aa

47 aa

[root@centos6 testdir]# wc -m aa

47 aa

[root@centos6 testdir]#

7、sort命令：文本排序：

sort [OPTION]… [FILE]…

-r：執行反方向排序整理；

-n：執行按數字大小排序；

-f：選項忽略字符串中的字符大小寫；

-u：選項刪除輸出中重復的行；

-t：指定排序時所用的排序分隔符；

-k：指定排序時所依照的列；

例：

sort -n：以數字執行正向排序：

[root@centos6 testdir]# cat passwd | sort -t: -k3 -n

aaaaaa

AAAAAA

tcpdump:x:72:72::/:/sbin/nologin

sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

pulse:x:497:495:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin

[root@centos6 testdir]#

sort -r：執行反向排序：

[root@centos6 testdir]# cat passwd | sort -t: -k3 -nr

pulse:x:497:495:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin

sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

tcpdump:x:72:72::/:/sbin/nologin

AAAAAA

aaaaaa

[root@centos6 testdir]#

sort -u：刪除輸出中重復的行：

[root@centos6 testdir]# cat passwd | sort -u

aaaaaa

AAAAAA

tcpdump:x:72:72::/:/sbin/nologin

sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

pulse:x:497:495:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin

[root@centos6 testdir]#

sort -f ：忽略字符串中字符的大小寫

[root@centos6 testdir]# cat passwd | sort -uf

aaaaaa

tcpdump:x:72:72::/:/sbin/nologin

sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

pulse:x:497:495:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin

[root@centos6 testdir]#

從ifconfig中取出所有IPv4地址：

[root@centos6 testdir]# ifconfig | tr -c "[[:digit:]]." "\n"| sort -u -t. -k3|tail -5

255.0.0.0

127.0.0.1

255.255.255.0

192.168.3.255

192.168.3.3

[root@centos6 testdir]#

8、uniq命令：從輸出中刪除重復前后相接的行

uniq [OPTION]… [INPUT [OUTPUT]]

-c：顯示每行重復出現的次數；

-d：僅顯示重復過的行；

-u：僅顯示不曾重復的行；

注：uniq命令常和sort命令一起配合使用；

例：

查找出/etc/init.d/functions文件中重復次數最多的字符：

…………………….

75 pid

77 then

83 if

333

[root@centos6 testdir]#

查找出遠程連接本機次數最多的IP：

1 96.7.54.187

4 192.168.3.4

[root@centos6 testdir]#

9、diff命令：比較兩個文件之間的區別：

diff [OPTION]… FILES

-u：詳細的顯示出兩個文件的不同之處；

例：

[root@centos6 testdir]# diff f1 f2

2c2,3

< wwww

—

> ddddddddddd

> dddddddddddss

3a5

> dddddddddddssfcccccf

[root@centos6 testdir]#

patch命令：復制在其它文件中進行的改變

patch [options] [originalfile [patchfile]]

-b：用來自動備份改變了的文件；

例：

模仿我們誤把f1文件刪除了，利用patch命令找回f1文件；

但這樣恢復回來的f1文件有個問題，patch命令會把恢復回來的f1文件命名成f2，而把原來的f2文件命名f2.orig；如需改名可以使用mv命令；

[root@centos6 testdir]# echo aaaaaaaaaaa > f1

[root@centos6 testdir]# echo aaaaaaaaaaa > f2

[root@centos6 testdir]# echo bbbbbbbbbbb >> f2

[root@centos6 testdir]# diff -u f1 f2 > diff.log

[root@centos6 testdir]# rm -rf f1

[root@centos6 testdir]# patch -b f2 diff.log

patching file f2

Reversed (or previously applied) patch detected! Assume -R? [n] y

[root@centos6 testdir]# ll

total 12

-rw-r–r–. 1 root root 126 Aug 7 23:13 diff.log

-rw-r–r–. 1 root root 12 Aug 7 23:13 f2

-rw-r–r–. 1 root root 24 Aug 7 23:13 f2.orig

[root@centos6 testdir]# cat f2

aaaaaaaaaaa

[root@centos6 testdir]# cat f2.orig

aaaaaaaaaaa

bbbbbbbbbbb

[root@centos6 testdir]#

10、Linux上文本處理三劍客:

grep：文本過濾工具（grep、egrep、fgrep）；

sed：stream editor，文本編輯工具；

grep：文本搜索工具，根據用戶指定的“模式”對目標文本逐行進行匹配檢查；打印匹配到的行；

grep [OPTIONS] PATTERN [FILE…]

–colcr=auto：對匹配到的文本著色顯示；

-v：反轉查找，即輸出與查找條件不相符的行；

-i：忽略字符大小寫；

-n：顯示匹配的行號；

-c：統計匹配到的行數；

-o：僅顯示匹配到的查找關鍵字；

-q：靜默模式，不輸出任何信息；

-A#：顯示出匹配到的行，連同后#行也一并顯示；

-B#：顯示出匹配到的行，連同前#行也一并顯示；

-C#：顯示出匹配到的行，連同前后#行也一并顯示；

-e：可以實現多個選項間的or匹配；

-w：整個單詞進行匹配，匹配到的是完整的單詞；

-E：等同于egrep命令；

同樣也可以使用變量和命令引用

[root@centos6 Desktop]# grep "$USER" /etc/passwd

[root@centos6 Desktop]# grep `whoami` /etc/passwd

字符匹配：

.：匹配任意單個字符；

[]：匹配指定范圍內的任意單個字符；

[^]：匹配指定范圍外的任意單個字符；

[:digit:]：表示所有數字；

[:lower:]：表示所有小寫字母；

[:upper:]：表示所有大寫字母；

[:alpha:]：表示所喲的字母（不區分大小寫）；

[:alnum:]：表示所有字母和數字；

[:punct:]：表示所有的標點符號；

[:space:]：表示所有的空白字符；

匹配次數：

匹配次數：用在要指定次數的字符后面，用于指定前面的字符要出現的次數；

*：匹配前面的字符任意次，包括0次；

貪婪模式：盡可能長的匹配；

.*：任意長度的任意字符；

\?：匹配其前面的字符0或1次；

\+：匹配前面的字符至少1次；

\{m\}：匹配前面的字符m次；

\{m,n\}：匹配前面的字符至少m次，至多n次；

\{,n\}：匹配前面的字符至多n次；

\{m,\}：匹配前面的字符至少m次；

位置錨定：定位出現的位置

^：行首錨定，用于模式的最左側；

$：行尾錨定，用于模式的最右側；

^PATTERN$：用于模式匹配整行；

^$：表示空行；

^[[:space:]]*$：空白行；

\<或\b：詞首錨定，用于單詞模式的左側；

\>或\b：詞尾錨定，用于單詞模式的右側；

\<PATTERN\>：匹配整個單詞；

分組：

分組的意義：將一個或多個字符捆綁在一起，當作一個整體進行處理；也被稱為 “后向引用 “引用前面分組括號中的模式所匹配到的字符而非模式本身；

\（\）

分組括號中的模式匹配到的內容會被正則表達式引擎記錄到內部的變量中，這些變量的命名方式為：\1 \2 \3……………；

例：

在/etc/passwd過濾出用戶名同shell名的行：

[root@centos7 ~]# grep "^$[[:alnum:]]\+$\>.*\1$" /etc/passwd

sync:x:5:0:sync:/sbin:/bin/sync

shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

halt:x:7:0:halt:/sbin:/sbin/halt

[root@centos7 ~]#

egrep擴展的正則表達式：

egrep=grep -E

grep [OPTIONS] PATTERN [FILE…]

擴展正則表達式的元字符：

字符匹配：

.：匹配任意單個字符；

[]：匹配指定范圍內的任意單個字符；

[^]：匹配除指定范圍內的任意字符；

[:digit:]：表示所有數字；

[:lower:]：表示所有小寫字母；

[:upper:]：表示所有大寫字母；

[:alpha:]：表示所喲的字母（不區分大小寫）；

[:alnum:]：表示所有字母和數字；

[:punct:]：表示所有的標點符號；

[:space:]：表示所有的空白字符；

次數匹配：

*：匹配前面字符任意次，包括0次；

？：匹配前面字符0次或1次；

+：匹配前面字符至少1次；

{ m}：匹配前面字符m次；

{n,m}：匹配前面字符至少n次，至多m次；

{0,m}：匹配前面字符至多m次；

{n,0}：匹配前面字符至少n次；

位置錨定：

^：錨定行首；

$：錨定行尾；

^PATTERN$：用于模式匹配整行；

^$：表示空行；

^[[:space:]]*$：空白行；

\<或\b：詞首錨定，用于單詞模式的左側；

\>或\b：詞尾錨定，用于單詞模式的右側；

\<PATTERN\>：匹配整個單詞；

分組：

（）

例：

在/etc/passwd過濾出用戶名同shell名的行：

[root@centos7 ~]# grep -E "^([[:alpha:]]*)\>.*\1$" /etc/passwd

sync:x:5:0:sync:/sbin:/bin/sync

shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

halt:x:7:0:halt:/sbin:/sbin/halt

[root@centos7 ~]#

或者：

^S|s：表示以大寫S開頭的行或包含小寫s的行；

例：

[root@centos7 ~]# grep -E "^S|s" /proc/meminfo

Buffers: 792 kB

SwapCached: 0 kB

Shmem: 10188 kB

PageTables: 21652 kB

NFS_Unstable: 0 kB

[root@centos7 ~]#

^（S|s）：表示以大寫S或小寫s開頭的行；

例：

[root@centos7 ~]# grep -E "^(S|s)" /proc/meminfo

SwapCached: 0 kB

SwapTotal: 1023996 kB

SwapFree: 1023996 kB

Shmem: 10188 kB

Slab: 119596 kB

[root@centos7 ~]#

例：

取出ifconfig中的ipv4地址：

[root@centos7 ~]# ifconfig | grep "inet\b" | tr -s " "|cut -d " " -f 3| grep -v "127.0.0.1"

192.168.3.2

[root@centos7 ~]#

找出/etc/passwd 中的兩位或三位數：

oot@centos7 ~]#cat /etc/passwd | grep -E -o "\b[[:digit:]]{2,3}\b"

………

992

990

…………

[root@centos7 ~]#

顯示/etc/grub2.cfg文件中，至少以一個空白字符開頭的且后面存非空白字符的行：

root@centos7 ~]# cat /etc/grub2.cfg | grep -E "^[[:space:]]+[^[:space:]]"

…………….

initrd16 /initramfs-0-rescue-27cebe594b5a45138a2e15e32a1cf607.img

source ${config_directory}/custom.cfg

source $prefix/custom.cfg;

[root@centos7 ~]#

顯示/proc/meminfo文件中以大小s開頭的行（要求用兩種方法）：

[root@centos7 ~]# cat /proc/meminfo | grep "^[Ss]"

[root@centos7 ~]# cat /proc/meminfo | grep -E "^(S|s)"

利用擴展的正則表達式分別表示0-9、10-99、100-199、200-249、250-255：

[([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])

添加用戶bash、testbash、basher以及nologin其shell為/sbin/nologin,而后找出/etc/passwd 文件中用戶名同shell名的行：

[root@centos7 ~]# cat /etc/passwd | grep -E "(^[[:alnum:]]+)\b.*\1$"

sync:x:5:0:sync:/sbin:/bin/sync

shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

halt:x:7:0:halt:/sbin:/sbin/halt

nologin:x:1004:1004::/home/nologin:/sbin/nologin

[root@centos7 ~]#

顯示本機中所有IPv4地址：

[root@centos7 profile.d]# ifconfig | grep -E -o "(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])"| grep -v -E -e "^255\>" -e "\<255$"| grep -v "^127\b"

192.168.3.2

[root@centos7 profile.d]#

使用echo命令輸出/etc/sysconfig，使用egrep取出基名：

[root@centos7 ~]# echo /etc/sysconfig | grep -E -o "[^/]+/?$"

sysconfig

[root@centos7 ~]#

使用echo命令輸出/etc/sysconfig，使用egrep取出目錄名：

[root@centos7 ~]# echo /etc/sysconfig | grep -E -o "^[/][[:alpha:]]+/?"

/etc/

[root@centos7 ~]#

取出磁盤/dev/sda分區使用率數值：

[root@centos7 profile.d]#

sed：處理文本工具：

sed是一種流編輯器，它一次處理一行內容，處理時，把當前處理的行存儲在臨時緩沖區中，稱為“模式空間”接著用sed命令處理模式空間中的內容，處理完成后，把模式空間的內容送往屏幕，接著出例下一行，這樣不斷重復，直到文件尾部，文件內容并沒有改變，除非你使用重定向存儲輸出。sed主要用來自動編輯一個或多個文件，簡化對文件的反復操作，編寫轉換程序；

sed [OPTION]… {script-only-if-no-other-script} [input-file]…

-n：不輸出模式空間中的內容至屏幕；

-e：多點編輯；

-f：從指定文件中讀取編輯腳本；

-r：支持使用擴展正則表達式；

-i：直接編輯原文件；

地址定界：

空地址：對全文進行處理；

單地址：#指定行；

/pattern/：被此模式匹配的每一行；

地址范圍：

#,#：匹配指定行；

#,+#：指定行后再加+#行；

#,/pattern/指定行后第一個被模式匹配到的行；

/pattern/,/pattern/

步進：

1~2：說有奇數行；

2~2：所有偶數行；

編輯命令：

d：刪除:

p：顯示模式空間中的內容；

a \text：在匹配到行后追加文本，支持使用\n實現多行追加；

i \text：在匹配到行前插入文本，支持使用\n實現多行追加；

c \text：把匹配到的行替換為此處指定的文本；

w /path/to/somefiel ：保存模式空間匹配到的行至指定的文件中；

r /path/from/somefiel ：讀取指定文件的內容至當前文件被模式匹配到的行處；實現文件合并；

=：為模式匹配到的行打印行號；

?。簵l件取反；

例：

[root@centos7 testdir]# cat /etc/fstab | sed "1,8"d

[root@centos7 testdir]# cat /etc/fstab | sed "/^UUID/a \new line"

[root@centos7 testdir]# cat /etc/fstab | sed "/^UUID/i \new line"

[root@centos7 testdir]# cat /etc/fstab | sed "/^UUID/c \new line"

[root@centos7 testdir]# cat /etc/fstab | sed "3r /etc/issue"

[root@centos7 testdir]# cat /etc/fstab | sed "/^UUID/w /testdir/f2"

[root@centos7 testdir]# cat /etc/fstab | sed "/^UUID/="

[root@centos7 testdir]# cat /etc/fstab | sed '/^#/!d'

s///：查找替換，其分隔符可自行指定，常用的由s@@@,s###

替換標記：

g:全局替換；

w /path/to/somefile ：將替換成功的結果保存至指定的文件中；

p：顯示替換成功的行；

高級編輯命令：

h：把模式空間中的內容覆蓋至保持空間中；

H：把模式空間中的內容追加至保持空間；

g：把保持空間中的內容覆蓋至模式空間；

G：把保持空間中的內容追加至模式空間；

x：把模式空間中的內容與保持空間中的內容互換；

n：覆蓋讀取匹配到的行的下一行至模式空間中；

N：追加讀取匹配到的行的下一行至模式空間中；

d：刪除模式空間中的行；

D：刪除多行模式空間中的所有行；

原創文章，作者：zhengyibo，如若轉載，請注明出處：http://www.www58058.com/35745

相關推薦

rsyslog日志存儲到mysql數據庫中并利用loganalyzer進行web圖形化分析管理

軟件安裝與管理–rpm、yum

N25第一周學習總結

select和case用法

shell腳本2

Linux網絡管理&腳本編程之執行流程、循環

軟件安裝與管理–rpm、yum