class7 文本處理命令及文本處理工具grep

lvasu ? 2016-08-08 09:43 ? Linux干貨

一、文本處理命令

1、文件內容查看cat, tac,rev

cat [OPTION]… [FILE]… 正序查看文本文件

-E: 顯示行結束符$

[root@6 ~]# cat -E a.txt
Filesystem     1K-blocks    Used Available Use% Mounted on$
$
$
/dev/sda2       50264772 4064872  43639900   9% / $
tmpfs             502068      80    501988   1% /dev/shm$
/dev/sda1         194241   34211    149790  19% /boot$
/dev/sda3       20027260  333392  18669868   2% /testdir$
/dev/sr0         3824484 3824484         0 100% /media/CentOS_6.8_Final$

-n: 對顯示出的每一行進行編號

[root@6 ~]# cat -n a.txt
    1	Filesystem     1K-blocks    Used Available Use% Mounted on
    2
    3
    4	/dev/sda2       50264772 4064872  43639900   9% /
    5	tmpfs             502068      80    501988   1% /dev/shm
    6	/dev/sda1         194241   34211    149790  19% /boot
    7	/dev/sda3       20027260  333392  18669868   2% /testdir
    8	/dev/sr0         3824484 3824484         0 100% /media/CentOS_6.8_Final

-A ：顯示所有控制符

[root@6 ~]# cat -A a.txt
Filesystem     1K-blocks    Used Available Use% Mounted on$
$
$
/dev/sda2       50264772 4064872  43639900   9% / $
tmpfs             502068      80    501988   1% /dev/shm$
/dev/sda1         194241   34211    149790  19% /boot$
/dev/sda3       20027260  333392  18669868   2% /testdir$
/dev/sr0         3824484 3824484         0 100% /media/CentOS_6.8_Final$

-b：非空行編號

[root@6 ~]# cat -b a.txt
    1	Filesystem     1K-blocks    Used Available Use% Mounted on
    2	/dev/sda2       50264772 4064872  43639900   9% /
    3	tmpfs             502068      80    501988   1% /dev/shm
    4	/dev/sda1         194241   34211    149790  19% /boot
    5	/dev/sda3       20027260  333392  18669868   2% /testdir
    6	/dev/sr0         3824484 3824484         0 100% /media/CentOS_6.8_Final

-s ：壓縮連續的空行成一行

[root@6 ~]# cat -s a.txt
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sda2       50264772 4064872  43639900   9% /
tmpfs             502068      80    501988   1% /dev/shm
/dev/sda1         194241   34211    149790  19% /boot
/dev/sda3       20027260  333392  18669868   2% /testdir
/dev/sr0         3824484 3824484         0 100% /media/CentOS_6.8_Final

tac [OPTION]… [FILE]… 逆序查看文本文件

[root@6 ~]# tac a.txt
/dev/sr0         3824484 3824484         0 100% /media/CentOS_6.8_Final
/dev/sda3       20027260  333392  18669868   2% /testdir
/dev/sda1         194241   34211    149790  19% /boot
tmpfs             502068      80    501988   1% /dev/shm
/dev/sda2       50264772 4064872  43639900   9% /
Filesystem     1K-blocks    Used Available Use% Mounted on

rev [options] [file …] 以字符為單位逆序輸出文本文件

            [root@6 ~]# rev a.txt
no detnuoM %esU elbaliavA desU    skcolb-K1     metsyseliF
/ %9   00993634  2784604 27746205       2ads/ved/
mhs/ved/ %1   889105    08      860205             sfpmt
toob/ %91  097941    11243   142491         1ads/ved/
ridtset/ %2   86896681  293333  06272002       3ads/ved/
laniF_8.6_SOtneC/aidem/ %001 0         4844283 4844283         0rs/ved/

2、分頁查看文件內容

more: 分頁查看文件

more [OPTIONS…] FILE…

-d: 顯示翻頁及退出提示

        ! command  ：臨時執行命令 ，enter 退出

less ：一頁一頁地查看文件或STDIN 輸出 man幫助命令查看分頁器

查看時有用的命令包括：

/文本搜索文本

n/N 個跳到下一個 or 上一個匹配

class7 文本處理命令及文本處理工具grep

2016-8-5 1.gif

3、顯示文本前或后行內容

head 正序截取文本文件

head [OPTION]… [FILE]…

-c #: 指定獲取前# 字節

-n #: 指定獲取前#行

-#：：指定行數

[root@6 ~]# head -2 a.txt
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sda2       50264772 4064872  43639900   9% /
[root@6 ~]# head -n2 a.txt
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sda2       50264772 4064872  43639900   9% /
[root@6 ~]# head -c 10 a.txt
Filesystem[root@6 ~]#

tail 倒序截取文本文件

tail [OPTION]… [FILE]…

-c #: 指定獲取后# 字節

-n #: 指定獲取后#行

-#：

[root@6 ~]# tail -2 a.txt
/dev/sda3       20027260  333392  18669868   2% /testdir
/dev/sr0         3824484 3824484         0 100% /media/CentOS_6.8_Final

-f: 跟蹤顯示文件新追加的內容, 常用日志監控

#logger “This is mine.”生成新日志

# tail -n 0 -f /var/log/messages & 后臺執行的，當有新信息通知

2016-8-5 2.gif

class7 文本處理命令及文本處理工具grep

4、按列抽取文本cut 和合并文件paste

cut [OPTION]… [FILE]…

-d DELIMITER: 指明分隔符，默認tab

-f FILEDS:

#: 第# 個字段

#,#[,#] ：離散的多個字段，例如1,3,6

#-# ：連續的多個字段, 例如1-6

混合使用：1-3,7

[root@6 ~]# cut -d: -f1-3,7  pass    ##-d -f 兩個選項經常搭配使用
root:x:0:/bin/bash
bin:x:1:/sbin/nologin
daemon:x:2:/sbin/nologin
adm:x:3:/sbin/nologin
lp:x:4:/sbin/nologin
sync:x:5:/bin/sync

-c 按字符切割

# cut -c 44-46 f1

[root@6 ~]# cut -c 10-20 pass
0:root:/roo
:bin:/bin:/
2:2:daemon:
:adm:/var/a
lp:/var/spo

–output-delimiter=STRING 指定輸出（修改）分隔符

[root@6 ~]# cut -d: -f1-3 --output-delimiter=+1 pass
root+1x+10
bin+1x+11
daemon+1x+12
adm+1x+13
lp+1x+14
sync+1x+15
shutdown+1x+16

顯示文件或STDIN 數據的指定列

cut -d: -f1 /etc/passwd

[root@6 ~]# cut -d: -f1 /etc/passwd
root
bin
daemon
adm
lp
sync
shutdown
halt
mail
uucp
operator
games
gopher

cat /etc/passwd | cut -d: -f7

[root@6 ~]# cat /etc/passwd | cut -d: -f7
/bin/bash
/sbin/nologin
/sbin/nologin
/sbin/nologin
/sbin/nologin
/bin/sync
/sbin/shutdown
/sbin/halt
/sbin/nologin
/sbin/nologin

cut -c2-5 /usr/share/dict/words

[root@6 ~]# cut -c2-5 /usr/share/dict/words |tail -10
ythi
ythu
yzom
yzzo
yzzy
yzzy
Z
z
Zt
ZZ
獲取ip
[root@6 ~]# ifconfig|head -2 | tail -1 |cut -d : -f 2
10.1.252.177  Bcast
僅顯示磁盤利用率
[root@6 ~]# df | tr -s ' '|tr -d '%'|cut -d ' '-f 5
Use
9
1
19
2
100

paste 合并兩個文件同行號的列到一行

paste [OPTION]… [FILE]…

-d 分隔符: 指定分隔符，默認用TAB

-s : 所有行合成一行顯示

paste f1 f2

paste -s f1 f2

[root@6 ~]# cat f1
123
456
789
[root@6 ~]# cat f2
abc
def
hij
[root@6 ~]# paste f1 f2
123	abc
456	def
789	hij
[root@6 ~]# paste -s f1 f2
123	456	789
abc	def	hij

5、收集文本統計數據wc

計數單詞總數、行總數、字節總數和字符總數

可以對文件或STDIN 中的數據運行

$ wc story.txt

39 237 1901 story.txt

行數字數字符數

使用 -l 來只計數行數

使用 -w 來只計數單詞總數

使用 -c 來只計數字節總數

使用 -m 來只計數字符總數

[root@6 ~]# wc pass
 52   71 2390 pass
[root@6 ~]# wc -l pass
52 pass
[root@6 ~]# wc -w pass
71 pass
[root@6 ~]# wc -c pass
2390 pass
[root@6 ~]# wc -m pass
2390 pass

6、文本排序sort和去重uniq

把整理過的文本顯示在STDOUT ，不改變原始文件

$ sort [options] file(s)

常用選項

-r 執行反方向（由上至下）整理（r不能出現在尾部）sort -rk 3

-n 執行按數字大小整理

-f 選項忽略（fold ）字符串中的字符大小寫

-u 選項（獨特，unique ）刪除輸出中的重復行

[root@6 ~]# sort -t: -unrk 2 a
b:34
b:23
a:12
a:1
[root@6 ~]# cat a
a:12
b:23
a:1
b:34
A:1
a:1
A:12

-t c 選項使用c 做為字段界定符

-k X 選項按照使用c 字符分隔的X列來整理能夠使用多次

****-t -k 常搭配

[root@6 ~]# sort -t: -nrk 3  pass    
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
wang:x:4328:4329::/home/wang:/bin/bash
mage:x:4327:4328::/home/mage:/bin/bash
nologin:x:4326:4327::/home/nologin:/bin/bash
basher:x:4325:4326::/home/basher:/bin/bash

uniq

uniq 命令：從輸入中刪除重復的前后相接的行

uniq [OPTION]… [FILE]…

-c: 顯示每行重復出現的次數；

[root@6 ~]# cat a
a:12
b:23
a:1
b:34
A:1
a:1
A:12
[root@6 ~]# uniq -c  a
     1 a:12
     1 b:23
     1 a:1
     1 b:34
     1 A:1
     1 a:1
     1 A:12

-d: 僅顯示連續重復過的行；

[root@6 ~]# uniq -d  a
a:1
[root@6 ~]# cat a
a:12
b:23
a:1
a:1
b:34
A:1
a:1
A:12

-u: 僅顯示不曾重復的行；

[root@6 ~]# uniq -u  a
a:12
b:23
b:34
A:1
a:1
A:12

uniq和sort常配合使用

sort userlist.txt | uniq -c

[root@6 ~]# cat /etc/init.d/functions|tr -cs '[:alpha:]' '\n'|sort |uniq -c |sort -rn
    83 if
    77 then
    75 pid
    73 echo
    72 fi
    61 return
    57 dev
    54 file
####對/etc/init.d/functions中重復單詞進行由大到小排序

7、比較文件

比較兩個文件之間的區別

$ diff foo.conf-broken foo.conf-works

5c5

< use_widgets= no

—

> use_widgets = yes

注明第5 行有區別（改變）

[root@6 ~]# cat a
a:12
b:23
a:1
a:1
b:34
A:1
a:1
A:12
[root@6 ~]# cat b
a:12
b:23
a:1
a:1
b:34
           [root@6 ~]# diff a b
6,8d5
< A:1
< a:1
< A:12

復制對文件改變patch

diff 命令的輸出被保存在一種叫做“補丁”的文件中

使用 -u 選項來輸出“統一的（unified ）”diff 格式文件，最適用于補丁文件。

patch 命令復制在其它文件中進行的改變（要謹慎使用?。?/span>

適用 -b 選項來自動備份改變了的文件

$ diff -u foo.conf-broken foo.conf-works > foo.patch

$ patch -b foo.conf-broken foo.patch

[root@6 ~]# diff a b
6,8d5
< A:1
< a:1
< A:12
[root@6 ~]# diff -u a b > c    ###生成比較文件
[root@6 ~]#  patch -b  b c     ##通過b和比較文件還原文件a,新文件命名為b，備份源文件.orig
patching file b
Reversed (or previously applied) patch detected!  Assume -R? [n] y
[root@6 ~]# ll
-rw-r--r--. 1 root root    36 8月   6 13:18 b
-rw-r--r--. 1 root root    23 8月   6 12:56 b.orig  ##生成備份文件
[root@6 ~]# cat b.orig
a:12
b:23
a:1
a:1
b:34
[root@6 ~]# cat b ###恢復的a文件
a:12
b:23
a:1
a:1
b:34
A:1
a:1
A:12

二、文本處理工具

Linux 上文本處理三劍客

grep ：文本過濾( 模式：pattern) 工具;

grep, egrep, fgrep （不支持正則表達式搜索）

sed ：stream editor ，文本編輯工具；

awk ：Linux 上的實現gawk ，文本報告生成器；

1、grep處理工具

grep: Global search REgular expression and Print outthe line.

作用：文本搜索工具，根據用戶指定的“模式”對目標文本逐行進行匹配檢查；打印匹配到的行；

模式：由正則表達式字符及文本字符所編寫的過濾條件

grep [OPTIONS] PATTERN [FILE…]

grep root /etc/passwd

grep "$USER" /etc/passwd

grep '$USER' /etc/passwd

grep `whoami` /etc/passwd

[root@6 ~]# grep root /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
                [root@6 ~]# grep "$USER" /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
[root@6 ~]# grep '$USER' /etc/passwd
[root@6 ~]#                                                  ###無結果
[root@6 ~]# grep `whoami` /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

grep 命令選項

–color=auto: 對匹配到的文本著色顯示；

     alias grep='grep --color=auto'

-v: 顯示不能夠被pattern 匹配到的行；

[root@6 ~]# cat a.txt
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sda2       50264772 4064872  43639900   9% /
tmpfs             502068      80    501988   1% /dev/shm
/dev/sda1         194241   34211    149790  19% /boot
/dev/sda3       20027260  333392  18669868   2% /testdir
/dev/sr0         3824484 3824484         0 100% /media/CentOS_6.8_Final
[root@6 ~]# grep -v  sd a.txt
Filesystem     1K-blocks    Used Available Use% Mounted on
tmpfs             502068      80    501988   1% /dev/shm
/dev/sr0         3824484 3824484         0 100% /media/CentOS_6.8_Final

-i: 忽略字符大小寫

-n: 顯示匹配的行號

[root@6 ~]# cat a
a:12
b:23
a:1
a:1
b:34
A:1
a:1
A:12
[root@6 ~]# grep -in a a
1:a:12
3:a:1
4:a:1
6:A:1
7:a:1
8:A:12

-c: 統計匹配的行數與其他選項，僅顯示行數

[root@6 ~]# grep -inc a a
6

-o: 僅顯示匹配到的字符串；

[root@6 ~]# grep -ino a a
1:a
3:a
4:a
6:A
7:a
8:A

-q: 靜默模式，不輸出任何信息

[root@6 ~]# grep -inqo a a
[root@6 ~]# echo $?                                   ###顯示結果結果狀態碼
0                                                     ###0表示有返回結果
[root@6 ~]# grep -inqo d a
[root@6 ~]# echo $?
1                                                     ###非0表示無返回結果

-A #：after, 后#行

[root@6 ~]# grep -A1  root pass
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
--
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin

-B #: before, 前#行

[root@6 ~]# grep -B1  root pass
root:x:0:0:root:/root:/bin/bash
--
uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin

-C # ：context, 前后各#行

[root@6 ~]# grep -C1  root pass
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
--
uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin

-e ：實現多個選項間的邏輯or 關系

grep –e ‘cat ’ -e ‘dog’ file

[root@6 ~]#  grep -e lvasu -e root  /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
lvasu:x:500:500:lvasu,class1,911,****:/home/lvasu:/bin/bash

-w ：整行匹配整個單詞

[root@6 ~]# grep  ro /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
rtkit:x:499:499:RealtimeKit:/proc:/sbin/nologin
[root@6 ~]# grep -w  ro /etc/passwd                         ###無匹配，與普通模式比較

-E ：使用擴展正則表達式

正則表達式

REGEXP：由一類特殊字符及文本字符所編寫的模式，其中有些字符（元字符）不表示字符字面意義，而表示控制或通配的功能

程序支持：grep, vim, less,nginx等

分兩類：

基本正則表達式：BRE

擴展正則表達式：ERE

grep -E, egrep

正則表達式引擎：

采用不同算法，檢查處理正則表達式的軟件模塊

PCRE（Perl Compatible Regular Expressions）

元字符分類：字符匹配、匹配次數、位置錨定、分組

基本正則表達式元字符

字符匹配:

. ：匹配任意單個字符；

[ ] ：匹配指定范圍內的任意單個字符

[^] ：匹配指定范圍外的任意單個字符

[:digit:] 、[:lower:] 、[:upper:] 、[:alpha:] 、[:alnum:]、[:punct:] 、[:space:]

[[:alpha:]_}:表示從字母和下劃線中取一個

字符匹配	.	任意單一字符
	[ ]	[ ]內任意單一字符
	[^]	除[ ]內任意單一字符

匹配次數：用在要指定次數的字符后面，用于指定前面的字符要出現的次數

* ：匹配前面的字符任意次，包括0次

貪婪模式：盡可能長的匹配

.* ：任意長度的任意字符

\? ：匹配其前面的字符0或1次

\+ ：匹配其前面的字符至少1次

\{m\} ：匹配前面的字符m次

\{m,n\} ：匹配前面的字符至少m 次，至多n次

\{,n\} ：匹配前面的字符至多n次

\{m,\} ：匹配前面的字符至少m次

次數匹配	*	前面字符重復不確定次數
	\+	前面字符重復一次以上不確定次數
	\?	前面字符重復0或1次
	.*	任意長度字符
	\{n\}	前面字符重復n次
	\{n,\}	前面字符重復n次以上
	\{m,n\}	前面字符重復m次和n次之間

位置錨定：定位出現的位置

^ ：行首錨定，用于模式的最左側

$ ：行尾錨定，用于模式的最右側

^PATTERN$: 用于模式匹配整行

^$: 空行

^[[:space:]]*$ ：空白行

\< 或 \b ：詞首錨定，用于單詞模式的左側

\> 或 \b ：詞尾錨定；用于單詞模式的右側

\<PATTERN\> ：匹配整個單詞

位置錨定	^	行首錨定
	$	行尾錨定
	^PATTERN$:	用于模式匹配整行
	^[[:space:]]*$	空白行
	\< 或 \b	詞首錨定
	\> 或 \b	詞尾錨定

分組：\(\)：將一個或多個字符捆綁在一起，當作一個整體進行處理，如：\(root\)\+

分組括號中的模式匹配到的內容會被正則表達式引擎記錄于內部的變量中，這些變量的命名方式為: \1, \2, \3, …

\1: 從左側起，第一個左括號以及與之匹配右括號之間的模式所匹配到的字符；

實例： \(string1\+\(string2\)*\)

\1: string1\+\(string2\)*

\2: string2

后向引用：引用前面的分組括號中的模式所匹配字符(而而非模式本身)

[root@6 ~]# grep  '^\(.*\).*/\1$' pass
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
bash:x:4323:4324::/home/bash:/bin/bash
basher:x:4325:4326::/home/basher:/bin/bash

2、egrep 及擴展的正則表達式藍字部分與標準正則表達式相異的地方

egrep = grep -E

egrep [OPTIONS] PATTERN [FILE…]

擴展正則表達式的元字符：

字符匹配：

. :任意單個字符

[ ] 指定范圍的字符

[^] 不在指定范圍的字符

次數匹配：

* ：匹配前面字符任意次

?: 0 或1次

+ ：1 次或多

{m} ：匹配m次

{m,n} ：至少m ，至多n次

位置錨定：

^ : 行首

$ : 行尾

\<, \b : 語首

\>, \b : 語尾

分組：

( )

后向引用：\1, \2, …

或者：

a|b

C|cat: C 或cat

(C|c)at:Cat 或cat

原創文章，作者：lvasu，如若轉載，請注明出處：http://www.www58058.com/30395

贊 (0)

0

文本處理工具

上一篇 2016-08-08 09:22

hello 小伙伴們

下一篇 2016-08-08 10:03

馬哥教育網絡班20期+第六周博客作業

1、復制/etc/rc.d/rc.sysinit文件至/tmp目錄，將/tmp/rc.sysinit文件中的以至少一個空白字符開頭的行的行首加#； ~]# cp -R /etc/rc.d/rc.sysinit /tmp/ ~]# vim /tmp/rc.sysinit &nb…

Linux干貨 2016-07-22
源碼編譯安裝Apache

編譯安裝Apache 系統環境：centos 7.2 前提：提供開發工具及開發環境開發工具：make, gcc等開發環境：開發庫，頭文件 glibc：標準庫方式：通過“包組”提供開發組件 centos 6 [root@centos6 ~]# yum groupinstall "Develo…

Linux干貨 2016-08-24
第四周作業

1、復制/etc/skel目錄為/home/tuser1，要求/home/tuser1及其內部文件的屬組和其它用戶均沒有任何訪問權限。 [root@unclez ~]# cp -r /etc/skel /home/tuser1 [root@unclez ~]# chmod -R&…

Linux干貨 2016-12-24
systemd

systemd 啟動流程：POST -> BIOS -> MBR bootloader -> kernel 臨時根 -> 根文件系統 -> init init 能夠管理、控制init進程的模塊,就是init系統能夠讓系統在用戶預定義的級別下運行設備管理器： sysvinit ： CentOS 5 第一個廣泛應用的init系統…

Linux干貨 2017-05-21
locate、find命令使用總結

一、簡介在linux系統中存在"一切皆文件"的說法，這就足以說明文件的重要性，因此查找文件也是我們必須要掌握的技能。這時候熟練使用locate、find命令也就顯得至關重要。尤其是find命令常用于日常工作中如安裝完某個軟件之后要查看這些軟件的安裝配置路徑，或是需要按指定條件直接查找我們需要操作的文件。因此更需…

Linux干貨 2015-08-31
與shell的”初接觸”

Linux干貨 2016-08-15

欧美性久久久久