正则表达式 \s

在编程中，有时需要进行字符串匹配和替换操作。这时候就需要使用正则表达式（正则表达式是一种字符串匹配模式，也称为regex 或 regexp）。正则表达式可以帮助我们轻松地对文本进行处理，例如查找匹配的字符串、替换字符串、验证输入是否符合要求等等。

什么是\s

在正则表达式中，\s 表示匹配任意的空白字符。所谓空白字符，包括空格、tab键（制表符）、回车符、换行符等等。

以下是一些 \s 的使用示例：

匹配一个空格字符

import re
text = "hello world"
match = re.search(r'hellos\sworld', text)
print(match)  # 输出 <re.Match object; span=(0, 11), match='hello world'>

匹配一个空格或者回车符

import re
text = "hello\nworld"
match = re.search(r'hellos\sworld', text)
print(match)  # 输出 <re.Match object; span=(0, 11), match='hello\nworld'>

匹配多个空白字符

import re
text = "hello      world"
match = re.search(r'hellos\sworld', text)
print(match)  # 输出 <re.Match object; span=(0, 13), match='hello      world'>

匹配制表符

import re
text = "hello\tworld"
match = re.search(r'hellos\sworld', text)
print(match)  # 输出 <re.Match object; span=(0, 11), match='hello\tworld'>

如何使用\s

\s 用法和其他正则表达式元字符用法一样，可以和其他元字符一起使用。例如：

匹配所有包含空格字符的字符串

import re
text = "hello world!  This is a test string."
matches = re.findall(r'\s+', text)
print(matches)  # 输出 [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']

在网页中找出所有

标签中的文本

import re
html = '<html><body><p>This is a test paragraph.</p><p>This is another paragraph.</p></body></html>'
matches = re.findall(r'<p>\s*([^<]+)\s*</p>', html)
for match in matches:
    print(match)
# 输出:
# This is a test paragraph.
# This is another paragraph.

去除字符串两端的空格

text = "  This is a test string.  "
text = re.sub(r'^\s+|\s+$', '', text)
print(text)  # 输出 "This is a test string."

注意事项

在使用 \s 的时候，需要注意以下问题：

问题1：换行符的匹配

在 Python 语言中，如果使用默认的正则表达式引擎 re ，\s 无法正确匹配换行符 \n。因此要匹配包括换行符在内的所有空白字符，可以使用 [\s\S] 或者 [\d\D] 或者 [\w\W]。

import re
text = "hello\nworld"
matches = re.findall(r'[\s\S]+', text)
print(matches)  # 输出 ['hello\nworld']

问题2：空白字符的数量

在匹配多个空白字符时，需要注意字符的数量，否则可能无法匹配到你想要匹配的内容。

import re
text = "hello        world"
match = re.search(r'hellos{8}world', text)  # 必须指定空格字符的数量是8个
print(match)  # 输出 <re.Match object; span=(0, 18), match='hello        world'>

问题3：匹配全角空格

在中文环境下，如果要匹配全角空格，需要使用 \u300

import re
text = "hello\u3000world"
match = re.search(r'hellos\u3000world', text)  # 匹配全角空格
print(match)  # 输出 <re.Match object; span=(0, 12), match='hello　world'>