截取字符串本身是一个很大的话题,在各shell操作中也经常遇到,高手们最常用的是grep、sed、awk组合,这三个命令都非常复杂,每一个都可以展开为一本书。
但平时的操作还真用很少用到复杂的表达式,追求简单是我们的不懈努力^_^
来看几个简单的截取:
1、截取最后部分的字符:
先看最后10行,这在看log的时候最常用:
tail /var/log/message
Apr 21 15:17:59 mm kernel: igb3: link state changed to DOWN
Apr 21 15:17:59 mm kernel: igb3: link state changed to UP
Apr 26 03:27:10 mm kernel: igb3: link state changed to DOWN
Apr 26 03:27:10 mm kernel: igb1: link state changed to DOWN
Apr 26 08:08:44 mm kernel: igb3: link state changed to UP
Apr 26 08:08:47 mm kernel: igb3: link state changed to DOWN
Apr 26 08:12:31 mm kernel: igb3: link state changed to UP
Apr 26 08:13:02 mm kernel: igb3: link state changed to DOWN
Apr 26 08:13:31 mm kernel: igb3: link state changed to UP
Apr 26 08:15:53 mm kernel: igb1: link state changed to UP
默认就是10行,要看50行呢,就加个-n参数吧:
tail –n 50 /var/log/message
Apr 10 07:12:29 mm kernel: igb3: link state changed to DOWN
……(48行)
Apr 26 08:15:53 mm kernel: igb1: link state changed to UP
这10行是按间顺序增大排列的,按时间递减可以不?
tail –rn 10 /var/log/message
Apr 26 08:15:53 mm kernel: igb1: link state changed to UP
Apr 26 08:13:31 mm kernel: igb3: link state changed to UP
Apr 26 08:13:02 mm kernel: igb3: link state changed to DOWN
Apr 26 08:12:31 mm kernel: igb3: link state changed to UP
Apr 26 08:08:47 mm kernel: igb3: link state changed to DOWN
Apr 26 08:08:44 mm kernel: igb3: link state changed to UP
Apr 26 03:27:10 mm kernel: igb1: link state changed to DOWN
Apr 26 03:27:10 mm kernel: igb3: link state changed to DOWN
Apr 24 22:02:42 mm kernel: ipfw: limit 80000 reached on entry 40
Apr 21 15:17:59 mm kernel: igb3: link state changed to UP
上面是以行计的,想看最后100个字符:
tail –c 100 /var/log/message
mm kernel: igb3: link state changed to UP
Apr 26 08:15:53 mm kernel: igb1: link state changed to UP
字符倒着输出应该更好玩吧,试试看:
Apr 26 08:15:53 mm kernel: igb1: link state changed to UP
mm kernel: igb3: link state changed to UP
不好意思,行可以倒,但字符不能倒,否则love成了evol,你还能看懂吗?
2、截取前开头的字符
用head命令,语法跟tail相似,就不再重复了。
3、截取某一列
先看一段测试vlan用的arp列表,注意最前面的问号(?)后面有一个空格:
arp –na
? (192.168.196.36) at 40:61:86:84:47:2a on vlan536 permanent [vlan]
? (192.168.196.35) at 40:61:86:84:47:2a on vlan535 permanent [vlan]
? (192.168.196.34) at 40:61:86:84:47:2a on vlan534 permanent [vlan]
? (192.168.196.32) at 40:61:86:84:47:2a on vlan532 permanent [vlan]
? (192.168.196.29) at 40:61:86:84:47:2a on vlan529 permanent [vlan]
? (192.168.196.28) at 40:61:86:84:47:2a on vlan528 permanent [vlan]
? (192.168.196.27) at 40:61:86:84:47:2a on vlan527 permanent [vlan]
? (192.168.196.26) at 40:61:86:84:47:2a on vlan526 permanent [vlan]
? (192.168.196.25) at 40:61:86:84:47:2a on vlan525 permanent [vlan]
? (192.168.196.24) at 40:61:86:84:47:2a on vlan524 permanent [vlan]
? (192.168.196.23) at 40:61:86:84:47:2a on vlan523 permanent [vlan]
? (192.168.196.22) at 40:61:86:84:47:2a on vlan522 permanent [vlan]
? (192.168.196.21) at 40:61:86:84:47:2a on vlan521 permanent [vlan]
? (192.168.196.20) at 40:61:86:84:47:2a on vlan520 permanent [vlan]
? (192.168.196.19) at 40:61:86:84:47:2a on vlan519 permanent [vlan]
? (192.168.196.18) at 40:61:86:84:47:2a on vlan518 permanent [vlan]
? (192.168.196.17) at 40:61:86:84:47:2a on vlan517 permanent [vlan]
? (192.168.196.16) at 40:61:86:84:47:2a on vlan516 permanent [vlan]
先看简单的例子:取出第3(注意不是前3)个字符:
arp –na | cut –b 3
(
(
(
(
……
显示一串左小括号。怎么回事呢?
首先看-b参数,cut支持三种模式:
-b:字节模式,按字节取出
-c:字符模式,按字符取出
-f:字段模式,按字段取出
字节和字符有什么区别呢?这主要用在多字节语言上,比如中文的GB2312编码,两个字节一个字符,-c将以两个字节为单位取出字符。举个例子,首先看当前的编码:
tt@v:~ % locale
LANG=
LC_CTYPE="zh_CN.GB2312"
LC_COLLATE="zh_CN.GB2312"
LC_TIME="zh_CN.GB2312"
LC_NUMERIC="zh_CN.GB2312"
LC_MONETARY="zh_CN.GB2312"
LC_MESSAGES="zh_CN.GB2312"
LC_ALL=zh_CN.GB2312
准备了一个中文文件:
tt@v:~ % cat test.txt
中华人民共和国山东省
来看-b模式,特殊字符,打印不出来,只好抓图显示了:
会发现这是一个特殊字符,实际上,这是“中”字的后一个字节。再用-c看看:
tt@v:~ % cat test.txt | cut -c 2
华
这回出现“华”字了。
要截取一段字��怎么办呢?比如想把“共和国”三个字取出来?那就加上范围吧:
hu@vpn:~ % cat test.txt | cut -c 5-7
共和国
要再加上一个“中”字呢?,那就用逗号分成列表吧:
hu@vpn:~ % cat test.txt | cut -c 1,5-7
中共和国
再回到我们最初的例子:我想取出mac列(按数据库的概念来说,也叫字段),怎么办呢?要用到字段模式(-f)了:
arp –na | cut –f 4
? (192.168.196.36) at 40:61:86:84:47:2a on vlan536 permanent [vlan]
? (192.168.196.35) at 40:61:86:84:47:2a on vlan535 permanent [vlan]
? (192.168.196.34) at 40:61:86:84:47:2a on vlan534 permanent [vlan]
? (192.168.196.32) at 40:61:86:84:47:2a on vlan532 permanent [vlan]
? (192.168.196.29) at 40:61:86:84:47:2a on vlan529 permanent [vlan]
? (192.168.196.28) at 40:61:86:84:47:2a on vlan528 permanent [vlan]
? (192.168.196.27) at 40:61:86:84:47:2a on vlan527 permanent [vlan]
? (192.168.196.26) at 40:61:86:84:47:2a on vlan526 permanent [vlan]
? (192.168.196.25) at 40:61:86:84:47:2a on vlan525 permanent [vlan]
? (192.168.196.24) at 40:61:86:84:47:2a on vlan524 permanent [vlan]
? (192.168.196.23) at 40:61:86:84:47:2a on vlan523 permanent [vlan]
? (192.168.196.22) at 40:61:86:84:47:2a on vlan522 permanent [vlan]
? (192.168.196.21) at 40:61:86:84:47:2a on vlan521 permanent [vlan]
? (192.168.196.20) at 40:61:86:84:47:2a on vlan520 permanent [vlan]
? (192.168.196.19) at 40:61:86:84:47:2a on vlan519 permanent [vlan]
? (192.168.196.18) at 40:61:86:84:47:2a on vlan518 permanent [vlan]
? (192.168.196.17) at 40:61:86:84:47:2a on vlan517 permanent [vlan]
? (192.168.196.16) at 40:61:86:84:47:2a on vlan516 permanent [vlan]
咦?好像没有效果啊!
来读一下参数:
-f 是字段模式,这个没��,4是第四列,注意到问号单独占了一列,cut的序号又是从1开始的,所以也没错?哪地方出问题了呢?
问题出一“分隔符”上,用-f参数,必须指定分隔符,表示用哪个字符分隔字段,默认是制表字符(TAB),arp显示的结果,是用空格分隔的,所以要用-d参数指定空格作为分隔符。
但问题又来了,空格在命令行中不好用,会被shell解释为分隔符,cut –f 4 –d ,就跟丢了-d参数一样。
那就按shell的惯例,加上双引号吧,最终的命令是这样:
arp -na | cut -f 4 -d " "
40:61:86:84:47:2a
40:61:86:84:47:2a
40:61:86:84:47:2a
40:61:86:84:47:2a
40:61:86:84:47:2a
40:61:86:84:47:2a
……
这次,arp出来了。
怎么样,这三个命令还是比较好用的吧。
该贴由hui.chen转至本版2014-11-5 17:04:59