Golang正则表达式

Golang是一门强类型语言，支持原生的正则表达式。正则表达式可以方便地搜索、替换，以及匹配字符串。Go的正则表达式并不是十分复杂，比如说不支持后向引用等高级用法。但是对于一些基础的用法，Golang正则表达式的支持还是够用的。接下来，我们将详细介绍Golang正则表达式的用法。

正则表达式的匹配

使用正则表达式匹配一个字符串可以使用regexp包。包内有两个主要结构体：Regexp和Match。 Regexp 代表了一个已经解析的正则表达式，Match 是某个文本在匹配对象上匹配的结果。一个正则表达式可以用 Compile() 函数解析并返回一个 Regexp 对象，然后这个对象可以应用于 Match() 或 ReplaceAllString()。

下面是一个简单的例子，用正则表达式匹配一个字符串中是否存在 1956 年：

import (
    "fmt"
    "regexp"
)

func main() {
    // 测试字符串
    str := "Westerns were the most popular type of TV show in the 1950s and 1960s. In fact, seven of the top 10 shows for the entire decade of the 1950s were westerns."

    // 正则表达式
    re := regexp.MustCompile(`\b1956\b`)

    // 匹配测试字符串
    fmt.Println(re.MatchString(str))
}

结果显示为 false。其中，\b 表示单词分界符的位置，串两边的正则表达式就能匹配单词。此处，\b1956\b 匹配的是一个单独的 1956。

正则表达式的组、捕获、替换

除了基本的匹配，还可以用正则表达式组、捕获并替换字符串。

组

组是指用圆括号包起来的子正则表达式。每个组都有一个唯一的索引，从1开始按照在表达式中的左括号右侧的顺序进行排序。使用 FindSubmatch() 可以返回按照索引排列的捕获组的子字符串。下面是一个例子：

import (
    "fmt"
    "regexp"
)

func main() {
    // 测试字符串
    str := "Pairs of words that begin with the same sound, such as \"cool cat\" and \"good guy,\" are called alliteration."

    // 正则表达式
    re := regexp.MustCompile(`(\w).*\b(\w)\w*\b`)

    // 匹配测试字符串，并打印匹配结果
    fmt.Printf("%q\n", re.FindAllStringSubmatch(str, -1))
}

匹配结果为：

[["Pairs" "s" "s"] ["begin" "n" "w"] ["cool" "l" "t"] ["guy" "g" "y"] ["alliteration" "a" "n"]]

此处，\w 表示单词字符，比如字母、数字、下划线。 \b 表示单词分界符。 .* 表示除了换行符以外的任意字符，* 表示出现任意次数。

捕获

如果需要捕获组的内容，可以使用 FindSubmatch() 方法。这个方法返回一个切片，其中第一个元素是整个匹配的子字符串，接下来是每个组匹配到的子字符串。例如：

import (
    "fmt"
    "regexp"
)

func main() {
    // 测试字符串
    str := "Pairs of words that begin with the same sound, such as \"cool cat\" and \"good guy,\" are calledalliteration."

    // 正则表达式
    re := regexp.MustCompile(`(\w).*\b(\w)\w*\b`)

    // 匹配并打印捕获组的内容
    submatches := re.FindStringSubmatch(str)
    fmt.Printf("%q\n", submatches[1:]) // ["P" "s"]

    // 用捕获组的内容替换测试字符串
    replacement := fmt.Sprintf(" ${2}$ {1}%s", submatches[2])
    result := re.ReplaceAllString(str, replacement)
    fmt.Println(result) // Sairs of dsow that negin with the smae suond, such as "tool cot" and "doog uys," are called antileliration.
}

此处，${1} 和 ${2} 表示分别引用第一个和第二个捕获组匹配到的子字符串。

替换

可以使用 ReplaceAllString() 函数将正则表达式匹配到的子字符串替换为另一个字符串。如下所示：

import (
    "fmt"
    "regexp"
)

func main() {
    // 测试字符串
    str := "Pairs of words that begin with the same sound, such as \"cool cat\" and \"good guy,\" are called alliteration."

    // 正则表达式
    re := regexp.MustCompile(`(\w).*\b(\w)\w*\b`)

    // 将匹配到的字符串替换为另一个字符串
    result := re.ReplaceAllString(str, " ${2}$ {1}")
    fmt.Println(result) // Sairs of dsow that negin with the smae suond, such as "tool cot" and "doog uys," are called antileliration.
}

这里的 ${1} 和 ${2} 同样是引用第一个和第二个捕获组的内容。

正则表达式中的转义字符

在Golang正则表达式中，有一些字符被用作特殊字符，例如 . 和 * 等。如果要匹配这些字符本身，需要在它们前面加上 \ 进行转义。下面是其他常用的转义字符：

转义字符	描述
`\d`	匹配0-9的数字
`\D`	匹配任意非数字字符
`\s`	匹配任意空白字符（包括空格、制表符等）
`\S`	匹配任意非空白字符
`\w`	匹配任意单词字符（包括字母、数字、下划线）
`\W`	匹配任意非单词字符

下面是一个例子，用正则表达式匹配字符串 "Is this alright?" 中包含的单词字符：

import (
    "fmt"
    "regexp"
)

func main() {
    // 测试字符串
    str := "Is this alright?"

    // 正则表达式
    re := regexp.MustCompile(`\w+`)

    // 匹配测试字符串，并打印匹配结果
    fmt.Printf("%q\n", re.FindAllString(str, -1))
}

匹配结果为：

["Is" "this" "alright"]

此处，\w 表示单词字符，+ 表示出现至少一次。

正则表达式的模式

正则表达式的模式包括一些特定的字符，用于描述匹配行为。下面是一些常用的模式：

模式	描述
`.`	匹配除换行符外的任意字符
`^`	匹配字符串开始位置
`$`	匹配字符串结束位置
`*`	匹配前面的字符出现任意次
`+`	匹配前面的字符出现至少一次
`?`	匹配前面的字符出现零次或一次
`{n}`	匹配前面的字符出现恰好 n 次
`{n,}`	匹配前面的字符出现至少 n 次
`{n,m}`	匹配前面的字符出现 n 到 m 次

下面是一个例子，用正则表达式匹配一个包含至少一个数字的字符串：

import (
    "fmt"
    "regexp"
)

func main() {
    // 测试字符串
    str := "abc123def"

    // 正则表达式
    re := regexp.MustCompile(`\d+`)

    // 匹配测试字符串，并打印匹配结果
    fmt.Printf("%q\n", re.FindString(str))
}

匹配结果为：

"123"

此处，\d 表示数字，+ 表示出现至少一次。

正则表达式的常见函数

除了前面介绍的 Match()、FindAll()、ReplaceAll()、FindSubmatch() 和 ReplaceAllString() 函数之外，还有一些常见的函数。

`FindString` 和 `FindStringIndex`

这两个函数分别返回第一个匹配的子字符串和子字符串的起始和结束索引。如果没有找到匹配项，则返回 "" 和 nil。接下来是一个例子：

import (
    "fmt"
    "regexp"
)

func main() {
    // 测试字符串
    str := "Pairs of words that begin with the same sound, such as \"cool cat\" and \"good guy,\" are called alliteration."

    // 正则表达式
    re := regexp.MustCompile(`(\w).*\b(\w)\w*\b`)

    // 找到第一个匹配的子字符串，以及它的起始和结束索引
    submatch := re.FindStringSubmatch(str)
    fmt.Printf("%q\n", submatch[1:]) // ["P" "s"]

    // 找到第一个匹配的子字符串以及它的起始和结束索引
    match := re.FindStringSubmatchIndex(str)
    if match != nil {
        fmt.Printf("Matched %q at index %d:%d\n", str[match[0]:match[1]], match[0], match[1])
    } else {
        fmt.Println("No match found.")
    }
}

输出结果为：

["P" "s"]
Matched "Pairs" at index 0:5

`FindAllString` 和 `FindAllStringSubmatch`

这两个函数分别返回所有匹配的子字符串或捕获组的子字符串。如果没有匹配到任何项，则返回空切片 []string{}。接下来是一个例子：

import (
    "fmt"
    "regexp"
)

func main() {
    // 测试字符串
    str := "The queen in Queen's Gambit is a fictional female chess prodigy."

    // 正则表达式
    re := regexp.MustCompile(`\b\w+\b`)

    // 返回所有匹配的子字符串
    fmt.Printf("%q\n", re.FindAllString(str, -1)) // ["The" "queen" "in" "Queen" "s" "Gambit" "is" "a" "fictional" "female" "chess" "prodigy"]

    // 返回所有匹配的捕获组的子字符串
    fmt.Printf("%q\n", re.FindAllStringSubmatch(str, -1)) // [["The"] ["queen"] ["in"] ["Queen"] ["s"] ["Gambit"] ["is"] ["a"] ["fictional"] ["female"] ["chess"] ["prodigy"]]
}

`Split`

Split() 函数根据正则表达式分割输入字符串。下面是一个例子：

import (
    "fmt"
    "regexp"
)

func main() {
    // 测试字符串
    str := "one,two, three   four"

    // 正则表达式
    re := regexp.MustCompile(`\s*,\s*`)

    // 用正则表达式分割字符串
    split := re.Split(str, -1)
    fmt.Printf("%q\n", split) // ["one"two" "three" "four"]
}

此处，\s*,\s* 表示一个逗号和零或多个空格。