Python 使用Beautifulsoup获取标记名称

BeautifulSoup被认为是最广泛使用的Python包之一，用于网页抓取。它是用于解析HTML和XML文档的最好工具之一，可以更简单、更快速地从网页中提取数据。在网页抓取中，获取特定HTML和XML元素的标记名称是最常见的任务之一。在处理HTML和XML文档时，获取给定元素的标记名称也是最常见的任务之一。

可以使用以下命令安装Python的BeautifulSoup库：

pip install beautifulsoup4

方法

使用name属性

方法1：使用name属性

这种方法包括使用BeautifulSoup来获取标签的名称，使用标签对象的name属性。该属性返回标签名称的字符串值。下面是name属性的语法：

语法

tag.name

返回类型 字符串值，包含标签名称。

步骤

导入BeautifulSoup模块。
定义一个HTML多行字符串，用于获取标签。
通过将HTML文档和解析器作为输入提供给BeautifulSoup构造函数，创建一个BeautifulSoup对象。在这种情况下，使用html.parser作为解析器。
使用soup.find()方法在文档中找到第一个出现的

标签。

使用name属性获取p标签对象的名称。
使用print()语句打印标签名称。

示例1

下面是演示这种方法的示例代码：

from bs4 import BeautifulSoup

# HTML document to be parsed
html_doc = """
<html>
<head>
   <title>TutorialsPoint</title>
</head>
<body>
   <p>TutorialsPoint</p>
</body>
</html>
"""

# Parse the HTML document using BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

# Get the first <p> tag in the HTML document
p_tag = soup.find('p')

# Get the tag name using the name attribute
tag_name = p_tag.name

# Print the tag name
print("Tag name is:", tag_name)

输出

Tag name is: p

示例2

在这个示例中，我们正在解析XML文档并从自定义标记中获取标记名称。

from bs4 import BeautifulSoup

xml_doc = '''
<book>
    <title>Harry Potter</title>
    <author>J.K. Rowling</author>
    <publisher>Bloomsbury</publisher>
</book>
'''

# Parse the XML document using BeautifulSoup
soup = BeautifulSoup(xml_doc, 'xml')

# Get the first <author> tag in the XML document
tag = soup.find('author')

# Get the tag name using the name attribute
tag_name = tag.name

# Print the tag name
print("Tag name is:", tag_name)

输出

Tag name is: author

示例3

在这个示例中，我们通过类名获取标签，然后应用name属性来获取标签的名称。

from bs4 import BeautifulSoup

# HTML document to be parsed
html_doc = """
<html>
<head>
   <title class="tut">TutorialsPoint</title>
</head>
<body>
   <p>TutorialsPoint</p>
</body>
</html>
"""

# Parse the HTML document using BeautifulSoup constructor
soup = BeautifulSoup(html_doc, 'html.parser')

# Get the tag using its class
p_tag = soup.find(class_='tut')

# Get the tag name using the name attribute
tag_name = p_tag.name

# Print the tag name
print("Tag name is:", tag_name)

输出

Tag name is: title

示例4

在此示例中，我们通过使用其id获取标签，然后使用name属性来获取标签的名称。

from bs4 import BeautifulSoup

# HTML document to be parsed
html_doc = """
<html>
<head>
   <title id="tut">TutorialsPoint</title>
</head>
<body>
   <p>TutorialsPoint</p>
</body>
</html>
"""

# Parse the HTML document using BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

# Get the tag using its id
p_tag = soup.find(id='tut')

# Get the tag name using the name attribute
tag_name = p_tag.name

# Print the tag name
print("Tag name is:", tag_name)