你知道python正则表达式如何跨行匹配吗？

正则表达式在文本匹配、模式匹配、数据清洗、表单验证、日志分析等场景下有着广泛的应用，不管你是开发人员、测试人员，或者其他任何行业从业者，只要你有处理文档的需求，掌握一点正则表达式可能会让你的工作效率大大提升。

.默认不匹配换行符

今天使用python语言来介绍一个非常简单的例子，下面这个例子搜索这样的模式，从hello起始，中间可以是任何字符，之后匹配world。r"hello.*world"中的逗点.表示匹配任意字符，*号表示匹配零次或者更多次。

import restring = "hello world"result = re.search(r"hello.*world", string)print(result.group())

运行上面的脚本结果如下，string字符中搜索到了预期的模式。

>>> import re>>> string = "hello world">>> result = re.search(r"hello.*world", string)>>> print(result.group())hello world

接着我们把string从单行改为跨行，脚本其他部分不变

import restring = """hello world"""result = re.search(r"hello.*world", string)print(result.group())

再次运行脚本报错，这是因为没有搜索到预期的模式导致result为None引起的，从这个结果我们可以判断出逗点.*显然没有匹配跨行的情况。

>>> import re>>> string = """hello...... world""">>> result = re.search(r"hello.*world", string)>>> print(result.group())Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'

进一步查询正则表达式的匹配字符规则也可以发现确实如此。

re.DOTALL

我们也可以参考上图给出的方案使用(.|\r|\n)*的方式匹配任意字符包括换行符，但是python还有更方便的处理方法，就是配置re.DOTALL可选模式。

在re.search中增加re.DOTALL选项

import restring = """hello world"""result = re.search(r"hello.*world", string, re.DOTALL)print(result.group())

再次运行有成功搜索到了预期的模式，可以匹配跨行的情况了。

>>> import re>>> string = """hello...... world""">>> result = re.search(r"hello.*world", string, re.DOTALL)>>> print(result.group())helloworld

参考文献

[1]. https://docs.python.org/zh-cn/3/howto/regex.html#matching-characters[2]. https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F

幸福双城资讯网

科技一点鑫得