正则表达式-摩杜云开发者社区

正则表达式_换行符

一.注意事项

1.关于字符集的设置

你会发现很多shell脚本里都有这么一个语句如下

LC_ALL=C
这个变量赋值的动作，是等于还原linux系统的字符集

因为我们系统本身是支持多语言的
德文
英文
中文
每一个语言都有其特有的语言，字符，计算机为了统一字符，生成了编码表

比如你平时喜欢让linux支持中文，如果你的系统编码是中文，很可能导致你的正则出错，因此要还原系统的编码
LANG='zh_CN.UTF-8'

执行一个还原本地所有编码信息的变量
LC_ALL=C

用法如下
export LC_ALL=C

作用是修改linux的字符集，通过locale命令可以查看本地字符集设置

linux通过如下变量设置程序运行的不同语言环境，如中文、英文环境

[root@localhost ~]#locale
LANG=en_US.UTF-8
LC_CTYPE="zh_CN.UTF-8"
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE="zh_CN.UTF-8"
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=zh_CN.UTF-8

一般我们会使用$LANG变量来设置linux的字符集，一般设置为我们所在的地区，如zh_CN.UTF-8

[root@localhost ~]# echo $LANG
en_US.UTF-8

为了让系统能正确执行shell语句（由于自定义修改的不同语言环境，对一些特殊符号的处理区别，如中文输入法，英文输入法下的标点符号等，导致shell无法执行）

我们会使用如下语句，恢复linux的所有的本地化设置，恢复系统到初始化的语言环境

export LC_ALL=C

2.两类、正则表达式符号

linux规范将正则表达式分为了两种

基本正则表达式（BRE、basic regular expression）

BRE对应元字符有 
^ $ . [ ] *

其他符号是普通字符
; \

扩展正则表达式（ERE、extended regular expression）

ERE在在BRE基础上，增加了
( ) { } ? + |  等元字符

转义符

反斜杠 \
反斜杠用于在元字符前添加，使其成为普通字符

关于单引号、双引号

没使用变量的话，请你都用单引号

二.基本正则表达式（BRE）

测试文本数据

[root@localhost ~]# cat t1.log -n
     1	Maybe I don't really want to know
     2	How your garden grows
     3	Cause I just want to fly
     4	Lately, did you ever feel the pain
     5	In the morning rain.
     6	As it soaks you to the bone?
     7	Maybe I just want to fly
     8	Want to live, I don't want to die
     9	Maybe I just want to breathe 
    10	Maybe I just don't believe 
    11	Maybe you're the same as me 
    12	We see things I'll never see
    13	You and I are gonna live forever.
    14	Maybe I will never be
    15	All the things that I want to be 
    16	Now is not the time to cry
    17	Now's the time to find out why
    18	Gonna live forever. 
    19	
    20

1.grep与正则

NAME
       grep, egrep, fgrep - print lines matching a pattern

SYNOPSIS
       grep [OPTIONS] PATTERN [FILE...]
       grep [OPTIONS] [-e PATTERN | -f FILE] [FILE...]

例如传入的pattern（模式是），我们可以统称你写的正则是模式

grep '关键字,模式,正则表达式'   数据流

2.^ 尖角符

语法
写于最左侧，如
^ma 逐行匹配，找到以ma开头的内容

结合grep用法，-i 忽略大小写，可以找到更多的数据匹配

za找出以m开头的行
[root@localhost ~]# grep '^m' t1.log -i
Maybe I don't really want to know
Maybe I just want to fly
Maybe I just want to breathe 
Maybe I just don't believe 
Maybe you're the same as me 
Maybe I will never be

-n 显示行号
[root@localhost ~]# grep '^m' t1.log -i -n
1:Maybe I don't really want to know
7:Maybe I just want to fly
9:Maybe I just want to breathe 
10:Maybe I just don't believe 
11:Maybe you're the same as me 
14:Maybe I will never be

-o 只显示grep找出来的结果，而不是那一行所有的信息
[root@localhost ~]# grep '^m' t1.log -i -n -o
1:M
7:M
9:M
10:M
11:M
14:M

匹配出live那一行
[root@localhost ~]# grep 'live' t1.log
Want to live, I don't want to die
You and I are gonna live forever
Gonna live forever.

3.$ 美元符

语法
word$ 匹配以word结尾的行

匹配所有以字符n结尾的行
[root@localhost ~]# grep 'n$' t1.log
Lately, did you ever feel the pain
In the morning rain

匹配所有以.结尾的行
[root@localhost ~]# grep "\.$" t1.log 
In the morning rain.
You and I are gonna live forever.

4.单、双引号区别

单引号、所见即所得，可以用于匹配如标点符号，还原其本义。
双引号、能够识别linux的特殊符号、或变量，需要借助转义符还原字符本义。
当需要引号嵌套时，一般做法是，双引号，嵌套单引号。

5.^$ 匹配空行

^字符
匹配以这个字符开头的行

字符$
匹配以这个字符结尾的行

^$
以空开头，空结尾===空行

找出文件的空行

[root@localhost ~]# grep '^$' t1.log -n
19:
20:

6. 点符

. 匹配除了换行符以外所有的内容、字符+空格，除了换行符。

.点处理空格

. 可以匹配到空格，以及任意字符
以及拿不到空行
但是点，不匹配换行符

测试数据
[root@localhost ~]# cat t2.log -n
     1	y
     2	i
     3	y
     4	u
     5	
     6	an

验证点和换行、空格的关系

[root@localhost ~]# grep '.' t2.log -on
1:y
2:i
3:y
4:u
6:a
6:n

. 匹配除换行符的所有字符

[root@localhost ~]# grep '.' t1.log  -n
1:Maybe I don't really want to know
2:How your garden grows
3:Cause I just want to fly
4:Lately, did you ever feel the pain
5:In the morning rain.
6:As it soaks you to the bone?
7:Maybe I just want to fly
8:Want to live, I don't want to die
9:Maybe I just want to breathe 
10:Maybe I just don't believe 
11:Maybe you're the same as me 
12:We see things I'll never see
13:You and I are gonna live forever.
14:Maybe I will never be
15:All the things that I want to be 
16:Now is not the time to cry
17:Now's the time to find out why
18:Gonna live forever.

. 代表任意一个字符

如
M.
M..
[root@localhost ~]# grep 'M' t1.log
Maybe I don't really want to know
Maybe I just want to fly
Maybe I just want to breathe 
Maybe I just don't believe 
Maybe you're the same as me 
Maybe I will never be

练习，找出符合.ay正则的行
[root@localhost ~]# grep '.ay' t1.log -n
1:Maybe I don't really want to know
7:Maybe I just want to fly
9:Maybe I just want to breathe 
10:Maybe I just don't believe 
11:Maybe you're the same as me 
14:Maybe I will never be
[root@localhost ~]# grep '.ay' t1.log -no
1:May
7:May
9:May
10:May
11:May
14:May

7. .$ 匹配任意字符结尾

. 任意一个字符
.$ 任意字符结尾
拿到每一行的结尾字符
[root@localhost ~]# grep '.$' t1.log -n
1:Maybe I don't really want to know
2:How your garden grows
3:Cause I just want to fly
4:Lately, did you ever feel the pain
5:In the morning rain.
6:As it soaks you to the bone?
7:Maybe I just want to fly
8:Want to live, I don't want to die
9:Maybe I just want to breathe 
10:Maybe I just don't believe 
11:Maybe you're the same as me 
12:We see things I'll never see
13:You and I are gonna live forever.
14:Maybe I will never be
15:All the things that I want to be 
16:Now is not the time to cry
17:Now's the time to find out why
18:Gonna live forever.

. 和转义符

只想拿到每一行结尾的普通小数点 .，需要对点转义

grep '\.$' t1.log

\ 转义符

转义字符，让有特殊意义的字符，现出原形，还原其本义。

\.
\$

8.空格、换行、tab

使用该网址，验证换行符

https://deerchao.cn/tools/wegester/使用这个网址来测试换行符的匹配

9.换行符、制表符

\b 匹配单词边界，如我想从字符串中"This is Regex"匹配单独的单词 "is" 正则就要写成 "\bis\b"
hello world
helloworld
[root@localhost ~]# grep '\bhello\b' t3.log
hello world

\n 匹配换行符 ，表示newline，向下移动一行，不会左右移动
\r 匹配回车符，表示return，回到当前行的最左边
在windows中，换行符号是 \r\n
linux中，换行符就是\n
linux中输入 enter键，表示\r \n
linux换行符是\n，表示\r+\n 换行且回车，换行且回到下一行的行首
windows换行符是\r\n，表示回车+换行
\t 匹配一个横向的制表符，等于tab键

10.* 星号

重复前一个字符0此或n次

[root@localhost ~]# grep 'w*' t1.log -n
1:Maybe I don't really want to know
2:How your garden grows
3:Cause I just want to fly
4:Lately, did you ever feel the pain
5:In the morning rain.
6:As it soaks you to the bone?
7:Maybe I just want to fly
8:Want to live, I don't want to die
9:Maybe I just want to breathe 
10:Maybe I just don't believe 
11:Maybe you're the same as me 
12:We see things I'll never see
13:You and I are gonna live forever.
14:Maybe I will never be
15:All the things that I want to be 
16:Now is not the time to cry
17:Now's the time to find out why
18:Gonna live forever.
19:
20:
[root@localhost ~]# grep 'w*' t1.log -no
1:w
1:w
2:w
2:w
3:w
7:w
8:w
9:w
14:w
15:w
16:w
17:w
17:w

11. .* 符

. 匹配任意一个字符
* 重复前一个字符0或N次
.* 找出任意内，[这一行有东西，没东西，]全给找出来，是*的作用
对比 . 和.*就理解了
只找出有字符的行
grep '.' t1.log
无论有无字符，都找出来这行
grep '.*' t1.log

. 不匹配换行

首先，不匹配换行这事，是因为 . 的作用
.* 是重复前面这个字符0次或N次

再次记住，.不处理换行的

12.^.*符号

^m  以m开头
.* 任意内容
^.* 以任意内容开头

语法
^.* 表示以任意多个字符开头的行

只找出以i开头的行
[root@localhost ~]# grep '^i' t1.log  -i
In the morning rain.

找出任意以字母i开头，以及匹配到后续所有数据
[root@localhost ~]# grep '^i.*' t1.log -i -o 
In the morning rain.

找出任意以字母m开头的行，且以e结尾的行，且拿到其中所有数据
[root@localhost ~]# grep '^m.*e$'  t1.log -i -o -n
14:Maybe I will never be

13. .*$ 符

以任意多个字符结尾的行

grep '.*$'  t1.log
等于
grep '.*' t1.log

尝试如下正则的意义

p.*$的作用
[root@localhost ~]# grep 'p.*$'  -i -n t1.log
4:Lately, did you ever feel the pain
[root@localhost ~]# grep 'p.*$'  -i -n t1.log  -o
4:pain

14.[ ] 中括号

中括号，有如下用法

[abc]

[abc] 匹配括号内的小写a、b、c字符
[A-Z]

提示，关于到大小写的精准匹配，就别添加忽略大小写参数了

[a-z]、 [A-Z] 、[a-zA-z]、[0-9]

[a-z]               匹配所有小写单个字母
[A-Z]               匹配所有单个大写字母
[a-zA-Z]        匹配所有的单个大小写字母
[0-9]               匹配所有单个数字
[a-zA-Z0-9] 匹配所有数字和字母

[a-z] 匹配小写字母

[A-Z] 匹配大写字母

[a-z0-9] 匹配小写字母和数字

[0-9A-Z] 匹配大写字母和数字

grep '[0-9A-Z]' t1.log

[a-z0-9A-Z] 匹配大写、小写字母、数字，没有空格，特殊符号

grep '[a-z0-9A-Z]' t1.log -n

只想拿到特殊符号，对中括号里的字符进行取反即可

grep '[^a-z0-9A-Z]'  t1.log -n

15.[^abc] 中括号取反

语法
[^abc] 排除中括号里的a、b、c ，和单独的^符号，作用是不同的
[^a-z] 排除小写字母

实践

[^a-z] 排除小写字母
grep '[^a-z]'  t1.log

16.{ } 花括号（扩展正则）

grep命令和扩展正则结合使用
grep '基本正则表达式'  t1.log
# 使用-E参数是最新扩展正则用法
grep -E '扩展正则表达式'  t1.log 
egrep '扩展正则表达式' t1.log

测试数据

a\{n,m\}

a\{n,m\} 重复字符a，n到m次
a\{1,3\} 重复字符a，1到3次
# 建议用这个语法 ，使用-E参数
grep  -E 'a{1,3}' t1.log

实践

测试数据

测试数据
[root@localhost ~]# cat t4.log
I am yiyuan
I am twenty years old

I like english

My qq num is 1474665197
my qq num is not 14444477777444446666555511119999777

Goog good study , day day up!

实践

匹配数字9一次到3次

[root@localhost ~]# grep -E '9{1,3}' t4.log -o
9
999
9

每次最少找出2个6、最多3个6

[root@localhost ~]# grep -E '6{2,3}' t4.log -o
66
666

每次只找出3个6

[root@localhost ~]# grep -E '7{3,}' t4.log -o
77777
777

grep 默认不认识扩展正则 {}

grep默认不认识扩展正则{}，识别不到它的特殊作用，因此只能用转义符，让他成为有意义的字符

解决办法

办法1
使用转义符 \{\}
办法2，让grep认识花括号，可以省去转义符
使用egrep命令
或者 grep -E

a\{n,\}

重复a字符至少n次，可以用简写了

8至少出现2次
grep -E '8{2,}' t1.log
8至少出现1次
grep -E '8{1,}' t1.log

a\{n\}

重复字符a，正好n次。
重复8出现3次
grep -E '8{3}' t1.log

a\{,m\}

匹配字符a最多m次。

重复8出现最多3次
grep -E '8{,3}' t1.log
grep -E '8{最少重复次数,最多重复次数}' t1.log

三.扩展正则表达式（ERE）

基本正则表达式

属于早期正则表达式，支持一些基本的功能
与grep、sed命令结合使用

扩展正则表达式

后来添加的正则表达式
和egrep、awk命令结合
必须是grep -E 参数

测试数据

测试数据
[root@localhost ~]# cat t4.log -n
     1	I am yiyuan
     2	I am twenty years old
     3	
     4	I like english
     5	
     6	My qq num is 1474665197
     7	my qq num is not 14444477777444446666555511119999777
     8	
     9	Goog good study , day day up!

1.+ 加号

语法
+ 
重复前一个字符1次或多次
注意和*的区别，*是0次或多次，找不到的那一行，也会显示出来

匹配一次或者多次0，没有0的行是不会显示的

2.1+

要求

每次找出一个、或者多个数字1
找出存在至少一次1的行
grep '1+'  t1.log
[root@localhost ~]# grep -E  '1+'  t4.log -n
6:My qq num is 1474665197
7:my qq num is not 14444477777444446666555511119999777

3.[0-9]+

从文中找出连续的数字，等于排除字母，特殊符号、空格

顺丰快递的数据库文件

地区：
手机号： 连续11位的数字 [0-9]{11}
姓名：
寄件人：
收件人：

提取出文件中的连续的数字
grep -E '[0-9]' t1.log

4.[a-z]+

找出连续的小写字母、等于排除大写字母、标点符号、数字，空格，找出每一个单词了吧

grep -E '[a-z]+' t1.log

5.[A-Za-z0-9]+

注意，这里添加了+号，就是找的连续的字母数字了

缺少+号则是每次匹配单个字符

grep -E '[A-Za-z0-9]+'  t1.log

6.`[^A-Za-z0-9]+]`

此写法，找出除了数字、大小写字母以外的内容，如空格、标点符号

你可以使用-o参数，看到每次匹配的内容

7.*和+的区别

语法
*是重复0次、重复多次，因此没匹配到的行也过滤出来了
+是重复1次、多次、因此至少匹配到1次才看到

例如，我们来找到字母o，看如下2个写法

'o+'
+号，是重复前面的字符1次或N次
重复找这个o1次，还是多次
grep -E 'o+' t1.log  
重复这个9，零次，或者N次
'9*'

7.`go*d和go+d和go?d区别`

准备测试数据

[root@localhost ~]# cat god.txt
I am God, I need you to good good study and day day up, otherwise I will send you to see Gd,oh sorry, gooooooooood!

[root@yuchao-tx-server test]# cat god.log
I am God, I need you to good good study and day day up, otherwise I will send you to see Gd,oh sorry, gooooooooood!

关于寻找god、goooood、gd的区别

go*d 可以有0个或者n个字母o
go*d 可以找到啥
[root@localhost ~]# grep 'go*d' god.txt -n -i -o 
1:God
1:good
1:good
1:Gd
1:gooooooooood

go+d 可以有一个或n个字母o
go+d 可以找到啥
扩展正则，使用-E才行
[root@localhost ~]# grep -E 'go+d'  god.txt -n -i -o
1:God
1:good
1:good
1:gooooooooood

go?d  可以有0个或者1个字母0
go?d 可以找到啥
[root@localhost ~]# grep -E 'go?d' god.txt -n -i -o
1:God
1:Gd

8.| 或者符

竖线在正则里是或者的意思

查看文件系统的inode数量和block信息

ext4文件系统
1.准备好分区，以及挂载该ext4
2.使用dumpe2fs命令查看该分区信息即可，过滤inode和block相关信息
得看该分区，而不是挂载点
[root@localhost ~]#dumpe2fs /dev/sdc | grep -E -i '^inode|^block'
dumpe2fs 1.42.9 (28-Dec-2013)
Inode count:              1310720
Block count:              5242880
Block size:               4096
Blocks per group:         32768
Inodes per group:         8192
Inode blocks per group:   512
Inode size:           256

xfs_info
[root@localhost ~]#xfs_info /xfs_test/ |grep -E 'isize|block'
meta-data=/dev/sdd               isize=512    agcount=4, agsize=3276800 blks
data     =                       bsize=4096   blocks=13107200, imaxpct=25
log      =internal               bsize=4096   blocks=6400, versinotallow=2
realtime =none                   extsz=4096   blocks=0, rtextents=0

查看内存和swap的容量信息

[root@localhost ~]#free -m | grep -E -i '^mem|^swap'
Mem:           1821         116        1370           9         334        1514
Swap:          2047           0        2047

排除文件的空行、注释行

grep -v参数，对结果取反

排除空行
grep -v '^$' t1.log

排除注释行
grep -v '^#' t1.log

排除文件的空行、注释行
grep -v '^$' t1.log | grep -v '^#' 
[root@localhost ~]# grep -v '^$' t1.log | grep -v '^#' -n 
1:Maybe I don't really want to know
2:How your garden grows
3:Cause I just want to fly
4:Lately, did you ever feel the pain
5:In the morning rain.
6:As it soaks you to the bone?
7:Maybe I just want to fly
8:Want to live, I don't want to die
9:Maybe I just want to breathe 
10:Maybe I just don't believe 
11:Maybe you're the same as me 
12:We see things I'll never see
13:You and I are gonna live forever.
14:Maybe I will never be
15:All the things that I want to be 
16:Now is not the time to cry
17:Now's the time to find out why
18:Gonna live forever.

# 使用正则的或的用法
grep -E '^#|^$'  t1.log -n -v

8.( ) 括号、分组符

语法
() 作用是将一个或者多个字符捆绑在一起，当做一个整体进行处理
1.可以用括号，把正则括起来，以及系统最多支持9个括号
小括号功能之一是分组过滤被括起来的内容，括号内的内容表示一个整体
括号内的数据，可以向后引用，
() () () ()    \1  \2  \3  \4 
括号()内的内容可以被后面的"\n"正则引用，n为数字，表示引用第几个括号的内容
\1：表示从左侧起，第一个括号中的模式所匹配到的字符
\2：从左侧起，第二个括号中的模式所匹配到的字符

测试数据

测试数据
[root@yuchao-tx-server test]# cat god.log
I am God, I need you to good good study and day day up, otherwise I will send you to see Gd,oh sorry, gooooooooood!
I am glad to see you, god,you are a good god!

要求仅仅匹配出glad和good

分组的第一个用法，将数据，正则当做一个整体处理
grep -E 'glad|good'  god.log
括号用法
grep -E 'g(la|oo)d'  god.log
g.........d

正则表达式_正则_02

9.分组与向后引用

向后引用用法，在grep中不容易体现，

语法
()      
分组过滤，被括起来的内容表示一个整体，另外()的内容可以被后面的\n引用，n为数字，表示引用第几个括号的内容
\n      
引用前面()里的内容，例如(abc)\1 表示匹配abcabc

测试数据

[root@localhost ~]# cat lovers.log
I like my lover.
I love my lover.
He likes his lovers.
He love his lovers.

提取love出现2次的行

[root@localhost ~]# grep -E '^.*(love).*\1.*' lovers.log -o
I love my lover.
He love his lovers.

一.注意事项

1.关于字符集的设置

2.两类、正则表达式符号

关于单引号、双引号

二.基本正则表达式（BRE）

1.grep与正则

2.^ 尖角符

3.$ 美元符

4.单、双引号区别

5.^$ 匹配空行

6. 点符

7. .$ 匹配任意字符结尾

8.空格、换行、tab

9.换行符、制表符

10.* 星号

11. .* 符

12.^.*符号

13. .*$ 符

14.[ ] 中括号

15.[^abc] 中括号取反

16.{ } 花括号（扩展正则）

三.扩展正则表达式（ERE）

1.+ 加号

2.1+

3.[0-9]+

4.[a-z]+

5.[A-Za-z0-9]+

6.[^A-Za-z0-9]+]

7.go*d和go+d和go?d区别

8.( ) 括号、分组符

9.分组与向后引用

6.`[^A-Za-z0-9]+]`

7.`go*d和go+d和go?d区别`