正则表达式匹配HTML标签

正则表达式匹配HTML标签

正则表达式匹配HTML标签

在网页开发中,经常会涉及到对HTML标签的处理和匹配。正则表达式是一种强大的工具,可以帮助我们快速准确地匹配HTML标签。本文将介绍如何使用正则表达式来匹配HTML标签,并提供一些示例代码。

匹配HTML标签

示例1:匹配HTML标签

import re

html = "<div class='content'>Hello, deepinout.com</div>"
pattern = "<.*?>"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

示例2:匹配带属性的HTML标签

import re

html = "<a href='https://www.deepinout.com'>Deepinout</a>"
pattern = "<.*?>"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

示例3:匹配嵌套的HTML标签

import re

html = "<div><p>Hello, deepinout.com</p></div>"
pattern = "<.*?>"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

提取HTML标签中的内容

示例4:提取HTML标签中的文本内容

import re

html = "<h1>Welcome to deepinout.com</h1>"
pattern = "<.*?>(.*?)</.*?>"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

示例5:提取HTML标签中的属性值

import re

html = "<a href='https://www.deepinout.com'>Deepinout</a>"
pattern = "href='(.*?)'"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

替换HTML标签

示例6:替换HTML标签为纯文本

import re

html = "<p>Hello, <strong>deepinout.com</strong></p>"
pattern = "<.*?>"
result = re.sub(pattern, "", html)
print(result)

Output:

正则表达式匹配HTML标签

示例7:替换HTML标签为指定文本

import re

html = "<p>Hello, <strong>deepinout.com</strong></p>"
pattern = "<strong>(.*?)</strong>"
result = re.sub(pattern, "Deepinout", html)
print(result)

Output:

正则表达式匹配HTML标签

匹配特定的HTML标签

示例8:匹配所有的链接标签

import re

html = "<a href='https://www.deepinout.com'>Deepinout</a> <a href='https://www.example.com'>Example</a>"
pattern = "<a.*?</a>"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

示例9:匹配所有的图片标签

import re

html = "<img src='image1.jpg'> <img src='image2.jpg'>"
pattern = "<img.*?>"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

匹配特定属性值

示例10:匹配所有链接的URL

import re

html = "<a href='https://www.deepinout.com'>Deepinout</a> <a href='https://www.example.com'>Example</a>"
pattern = "href='(.*?)'"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

示例11:匹配所有图片的URL

import re

html = "<img src='image1.jpg'> <img src='image2.jpg'>"
pattern = "src='(.*?)'"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

匹配多个HTML标签

示例12:匹配所有的标题标签

import re

html = "<h1>Title 1</h1> <h2>Title 2</h2> <h3>Title 3</h3>"
pattern = "<h[1-3].*?</h[1-3]>"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

示例13:匹配所有的列表标签

import re

html = "<ul><li>Item 1</li><li>Item 2</li></ul>"
pattern = "<ul>.*?</ul>"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

匹配特定内容

示例14:匹配包含指定文本的标签

import re

html = "<p>Hello, deepinout.com</p> <p>Welcome to deepinout.com</p>"
pattern = "<p>.*?deepinout.com.*?</p>"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

示例15:匹配不包含指定文本的标签

import re

html = "<p>Hello, deepinout.com</p> <p>Welcome to example.com</p>"
pattern = "<p>(?:(?!example.com).)*?</p>"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

匹配特定结构

示例16:匹配所有的段落标签

import re

html = "<p>Paragraph 1</p> <div>Content</div> <p>Paragraph 2</p>"
pattern = "<p>.*?</p>"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

示例17:匹配所有的div标签

import re

html = "<p>Paragraph 1</p> <div>Content</div> <p>Paragraph 2</p>"
pattern = "<div>.*?</div>"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

匹配特定格式

示例18:匹配所有的加粗文本

import re

html = "<p>Hello, <strong>deepinout.com</strong></p> <p>Welcome to <strong>example.com</strong></p>"
pattern = "<strong>.*?</strong>"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

示例19:匹配所有的斜体文本

import re

html = "<p>Hello, <em>deepinout.com</em></p> <p>Welcome to <em>example.com</em></p>"
pattern = "<em>.*?</em>"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

匹配特定数量

示例20:匹配重复出现的标签

import re

html = "<p>Paragraph 1</p> <p>Paragraph 2</p> <p>Paragraph 3</p>"
pattern = "<p>.*?</p>"
result = re.findall(pattern, html)
print(result)

Output:

正则表达式匹配HTML标签

Camera课程

Python教程

Java教程

Web教程

数据库教程

图形图像教程

办公软件教程

Linux教程

计算机教程

大数据教程

开发工具教程