正则表达式

什么是正则表达式

正则表达式（Regular Expression），简称正则，是一种描述文本模式的工具，通常被用来进行字符串的匹配、查找和替换。正则表达式由字符、特殊字符和元字符组成，具有结构化、表达力强、处理文本方便等优势。正则表达式被广泛用于文本处理、编译原理、自然语言处理等领域。

下面是一个简单的正则表达式示例，用来匹配字符串中的所有数字字符：

const pattern = /\d+/g;
const str = 'This is a 123456 test string.';
const matches = str.match(pattern);
console.log(matches); // output: [ '123456' ]

在该示例中，使用正则表达式\d+来匹配字符串中所有的数字字符。其中，\d表示匹配任意一个数字字符，+表示匹配至少一个数字字符。使用g标记表示全局匹配。

正则表达式语法

正则表达式语法根据字符的特性，可以分为普通字符、特殊字符和元字符三种类型。

普通字符

普通字符（Literal Characters）是正则表达式语法中最基本的字符，可以表示任意一个字符。在正则表达式中，字符和元字符的区别在于字符在表达式中代表它们本身的值，而元字符则代表特殊的功能或语义。

举个例子，下面的代码将匹配字符串hello world中的字符o：

const pattern = /o/g;
const str = 'hello world';
const matches = str.match(pattern);
console.log(matches); // output: [ 'o', 'o' ]

在该示例中，使用正则表达式o来匹配字符串中所有的字符o。使用g标记表示全局匹配。

特殊字符

特殊字符（Special Characters）是正则表达式语法中具有特殊含义或功能的字符。它们包括特殊字符组、取反字符组、范围字符组、量词、定位符、转义符等。

特殊字符组

特殊字符组（Special Character Classes）用于匹配特定类别的字符，包括单词字符、空白字符、数字字符、符号字符等。特殊字符组的常见形式如下：

\d：匹配一个数字字符。
\w：匹配一个单词字符（字母、数字、下划线）。
\s：匹配一个空白字符（包括空格、制表符、换行符等）。
.：匹配任意一个字符。

举个例子，下面的代码将匹配字符串hello world中的数字字符和空格字符：

const pattern = /[\d\s]/g;
const str = 'hello world 123';
const matches = str.match(pattern);
console.log(matches); // output: [ ' ', '1', '2', '3' ]

在该示例中，使用正则表达式[\d\s]来匹配字符串中所有的数字字符和空格字符。使用g标记表示全局匹配。

取反字符组

取反字符组（Negated Character Classes）用于匹配除指定字符组以外的任意字符。取反字符组的形式是在字符组前加一个^符号。举个例子，下面的代码将匹配字符串hello world中除了单词字符以外的任意字符：

const pattern = /[^\/w]/g;
const str = 'hello world 123_%';
const matches = str.match(pattern);
console.log(matches); // output: [ ' ', ' ', '1', '2', '3', '_', '%']

在该示例中，使用正则表达式`[^\/w]`来匹配字符串中除了单词字符以外的任意字符。使用`g`标记表示全局匹配。

#### 范围字符组

范围字符组（Character Range）用于匹配指定范围内的任意字符。范围字符组的形式是在字符组中使用`-`符号，表示匹配范围内的所有字符。举个例子，下面的代码将匹配字符串`hello world`中的小写字母：

```javascript
const pattern = /[a-z]/g;
const str = 'Hello World';
const matches = str.match(pattern);
console.log(matches); // output: [ 'e', 'l', 'l', 'o', 'o', 'r', 'l', 'd' ]

在该示例中，使用正则表达式[a-z]来匹配字符串中的小写字母。使用g标记表示全局匹配。

量词

量词（Quantifiers）用于表示字符或字符组的重复出现次数，包括贪婪匹配和非贪婪匹配两种模式。

常见的量词包括：

*：匹配前面的字符或字符组0次或多次（贪婪模式）。
+：匹配前面的字符或字符组1次或多次（贪婪模式）。
?：匹配前面的字符或字符组0次或1次。
{n}：匹配前面的字符或字符组恰好n次。
{n,}：匹配前面的字符或字符组至少n次。
{n,m}：匹配前面的字符或字符组n到m次。

举个例子，下面的代码将匹配字符串hello world中的连续两个小写字母：

const pattern = /[a-z]{2}/g;
const str = 'Hello World';
const matches = str.match(pattern);
console.log(matches); // output: [ 'el', 'lo', 'or', 'ld' ]

在该示例中，使用正则表达式[a-z]{2}来匹配字符串中的连续两个小写字母。使用g标记表示全局匹配。

定位符

定位符（Anchors）用于表示匹配字符或字符组出现的位置，而不是字符本身，包括行首、行尾、词首、词尾等位置。

常见的定位符包括：

^：匹配字符串的行首位置。
$：匹配字符串的行尾位置。
\b：匹配一个词的边界。
\B：匹配一个非词的边界。

举个例子，下面的代码将匹配字符串hello world中的行首字母：

const pattern = /^h/g;
const str = 'hello world';
const matches = str.match(pattern);
console.log(matches); // output: [ 'h' ]

在该示例中，使用正则表达式^h来匹配字符串中的行首字母。使用g标记表示全局匹配。

转义符

转义符（Escape Character）用于将元字符转义成普通字符，或是表示一些特殊的字符。

常见的转义符包括：

\\：转义符本身。
\.：匹配任意一个字符。
\+：匹配字符+。
\*：匹配字符*。
\?：匹配字符?。
\/：匹配字符/。
\{：匹配字符{。
\}：匹配字符}。
\(：匹配字符(。
\)：匹配字符)。

举个例子，下面的代码将匹配字符串hello+world中的字符+：

const pattern = /hello\+world/g;
const str = 'hello+world';
const matches = str.match(pattern);
console.log(matches); // output: [ 'hello+world' ]

在该示例中，使用正则表达式hello\+world来匹配字符串中的字符+。使用g标记表示全局匹配。

元字符

元字符（Metacharacters）是正则表达式语法中具有特殊含义或功能的字符。它们包括分组、引用、分支、反向引用等。

分组

分组（Grouping）用于将一组字符看成一个整体，形成子表达式。分组的形式是在表达式中使用一对括号()，括号中的表达式为子表达式。

举个例子，下面的代码将匹配字符串hello world中的连续两个小写字母，并将其作为一个子表达式：

const pattern = /([a-z]{2})/g;
const str = 'Hello World';
const matches = str.match(pattern);
console.log(matches); // output: [ 'el', 'lo', 'or', 'ld' ]

在该示例中，使用正则表达式([a-z]{2})来匹配字符串中的连续两个小写字母，并将其作为一个子表达式。使用g标记表示全局匹配。

引用

引用（Backreference）用于在表达式中引用先前定义的子表达式。

引用的形式是在表达式中使用\数字，其中数字表示子表达式的编号。举个例子，下面的代码将匹配字符串hellohello中的连续两个重复子串：

const pattern = /(hello)\1/g;
const str = 'hellohello';
const matches = str.match(pattern);
console.log(matches); // output: [ 'hellohello' ]

在该示例中，使用正则表达式(hello)\1来匹配字符串中的连续两个重复子串。其中，\1表示第一个子表达式(hello)。使用g标记表示全局匹配。

分支

分支（Alternation）用于在表达式中选择多个分支匹配。分支的形式是使用|符号，表示匹配多个分支中的一个。

举个例子，下面的代码将匹配字符串hello world中的单词hello或world：

const pattern = /(hello|world)/g;
const str = 'hello world';
const matches = str.match(pattern);
console.log(matches); // output: [ 'hello', 'world' ]

在该示例中，使用正则表达式(hello|world)来匹配字符串中的单词hello或world。使用g标记表示全局匹配。

反向引用

反向引用（Negative Backreference）用于在表达式中引用先前未匹配的子表达式。

反向引用的形式是使用\数字，数字表示先前未匹配的子表达式的编号，并加上?<!前缀。举个例子，下面的代码将匹配字符串hellohello中hello后面没有重复出现的hello：

const pattern = /(hello)(?<!\1hello)/g;
const str = 'hellohello';
const matches = str.match(pattern);
console.log(matches); // output: [ 'hello' ]

在该示例中，使用正则表达式(hello)(?<!\1hello)来匹配字符串中hello后面没有重复出现的hello。其中，\1表示第一个子表达式(hello)，?<!表示否定的先行断言（Negative Lookbehind）。使用g标记表示全局匹配。

在JavaScript中使用正则表达式

在JavaScript中，我们可以使用RegExp对象或字符串字面量来表示正则表达式。

创建RegExp对象

创建RegExp对象有两种常见的方式：

使用RegExp构造函数。

const pattern = new RegExp('hello', 'g');

构造函数需要传入两个参数，第一个参数是字符串表示正则表达式，第二个参数是可选的标识符，可以是g、i、m、s等。

使用正则字面量。

const pattern = /hello/g;

正则字面量使用/符号表示正则表达式，后面可以加上标识符。

正则表达式方法

在使用正则表达式时，常见的方法包括：

test(str)：测试字符串是否匹配正则表达式，返回一个布尔值。
exec(str)：在字符串中查找正则表达式匹配的结果，返回一个数组，包含匹配的字符串和子串。
match(pattern)：在字符串中查找正则表达式匹配的结果，返回一个数组，包含匹配的字符串和子串。
search(pattern)：在字符串中搜索正则表达式第一次出现位置的索引，返回一个整数。
replace(pattern, replacement)：在字符串中用替换字符串替换掉正则表达式匹配的部分，返回一个新的字符串。

举个例子，下面的代码演示了如何使用正则表达式方法：

const pattern = /\d+/g;
const str = 'abc123def456ghi789';
console.log(pattern.test(str)); // output: true
console.log(pattern.exec(str)); // output: [ '123', index: 3, input: 'abc123def456ghi789', groups: undefined ]
console.log(str.match(pattern)); // output: [ '123', '456', '789' ]
console.log(str.search(pattern)); // output: 3
console.log(str.replace(pattern, 'x')); // output: 'abcxdefxghi'

在该示例中，使用正则表达式\d+来匹配字符串中的一个或多个数字，并使用不同的方法测试字符串匹配结果、查找匹配结果、搜索匹配位置、替换匹配结果。