Python Faker|极客笔记

Python Faker

对Faker的介绍

Python提供了一个开源库，也被称为Faker，帮助用户构建数据集。我们可以使用随机属性（例如姓名、年龄、地点等）来生成随机数据。Faker库支持所有中心位置和语言，有助于根据特定地区生成数据。

我们可以利用这些Faker数据来调整机器学习模型、对模型进行压力测试等等。我们可以根据需求生成数据。我们还可以使用Faker数据进行培训和学习，例如对各种数据类型执行各种操作。

我们还可以利用生成的数据集来调整机器学习模型、验证模型和测试模型。

在接下来的教程中，我们将了解Faker及其功能，并创建我们自己的数据集。

让我们从实现Faker库开始。

Faker库的实施

在开始使用faker之前，我们需要安装该库。我们可以在命令提示符或终端中使用pip安装程序进行安装，如下所示：

语法：

$ pip install faker

导入必要的库

为了发现faker库的各种函数，我们必须导入faker库。我们还导入pandas库，因为我们将在数据集上执行几个操作。

语法：

from faker import Faker
import pandas as pd

使用各种函数

一旦我们导入所需的库，让我们尝试使用Faker库中可用的各种函数。为了进行这样的活动，我们必须使用变量来初始化Faker函数，如下所示：

语法：

sample = Faker()

我们将要使用的一些功能列在下面：

句法：

sample.name()
sample.date_of_birth()
sample.address()
sample.country()
sample.email()

让我们考虑一个示例，说明这些函数的工作方式：

示例：

# importing the required libraries
from faker import Faker
import pandas as pd

# defining the variable for Faker() module
sample = Faker()

# using some functions
print("Your Name: ", sample.name())
print("Your Date of Birth: ", sample.date_of_birth())
print("Your Address: ", sample.address())
print("Your Country: ", sample.country())
print("Your E-mail Address: ", sample.email())

输出:

Your Name:  Teresa Hill
Your Date of Birth:  1950-03-12
Your Address:  430 Bauer Turnpike Suite 931
Annaton, OR 12319
Your Country:  Angola
Your E-mail Address:  stash

说明：

在上面的示例中，我们导入了所需的库并定义了一个用于模块的变量。然后我们使用了一些函数，如 $name、$ date_of_birth、 $address、$ country和$email来生成一些虚假的数据集。这个生成的数据集是如此随机，以至于每次执行代码时我们都会得到不同的数据集作为输出。

我们还可以根据不同的地区和不同的语言生成信息。我们所需要做的就是指定我们想要的语言。让我们考虑以下示例，我们在印地语、法语和日语中生成了一些数据。

示例：

# importing the required libraries
from faker import Faker
import pandas as pd

# defining the variable for Faker() module
sample = Faker(['hi_IN', 'fr', 'jp_JP'])
for n in range(10):
    print(sample.name())

输出：

Thomas Schneider
?????? ??????
?? ??
Lucas Poulain
????? ??????
Aurélie Merle-Menard
?? ??
????????, ??????
Stéphane Lefebvre-Alves
????? ?????

说明：

在上面的示例中，我们再次定义了所需的库，并为 Faker() 模块定义了一个变量，我们在其中提供了一些语言作为参数。然后我们使用了“ for ”语句来打印指定次数内生成的不同名称。结果，程序为用户生成了十个不同语言的不同名称。

我们也可以使用诸如 text 和 sentences 之类的函数生成我们自己的文本或句子。

让我们考虑以下示例来了解这些函数的工作原理。

示例：

# importing the required libraries
from faker import Faker
import pandas as pd

# defining the variable for Faker() module
sample = Faker()

# printing the text
print("Text: ", sample.text())

# printing the sentence
print("Sentence: ", sample.sentence())

输出：

Text:  Size plant task we through score name. Whose learn drop ground.
Option entire some surface seek film involve. Billion body really common decade man. Worker foreign your then likely beat.
Sentence:  Project star plant she energy them leave.

解释：

在上面的示例中，我们再次导入了所需的模块，并为 Faker() 模块定义了一个变量。然后，我们使用了 text 和 sentence 函数来创建我们自己的句子，并将它们打印出来给用户。结果，我们成功地创建了我们自己的句子。

然而，我们也可以定义一个单词库，其中存储了一个单词列表，允许我们使用指定的单词生成新的虚假句子。让我们考虑以下示例来生成虚假句子。

示例：

# importing the required libraries
from faker import Faker
import pandas as pd

# defining the variable for Faker() module
sample = Faker()
# list of words
mywords = ['Cow', 'domestic', 'why', 'what', 'bird', 'parrot', 'is', 'animal', 'a', 'my']

# printing the sentence
print("Sentence: ", sample.sentence(ext_word_list = mywords))

输出：

Sentence:  Cow is domestic domestic domestic animal animal.

说明：

在上面的示例中，我们再次导入了所需的库并定义了 Faker() 模块变量。我们定义了一个单词列表，并使用 sentence() 函数使用我们创建的单词库创建了一个句子。结果是，使用列表中的单词生成了一个假句子。

此外， Faker() 模块还提供了一个函数，可以生成不同不存在人物的完整个人资料，而不是单独生成姓名和地址。这个函数被称为 profile 函数，它生成一个假人的个人资料。

让我们看下面的示例以了解这个函数的行为。

示例：

# importing the required libraries
from faker import Faker
import pandas as pd

# defining the variable for Faker() module
sample = Faker()

# generating the profile
print("Complete Profile: ", sample.profile())

输出：

Complete Profile:  {'job': 'Minerals surveyor', 'company': 'Nichols and Sons', 'ssn': '715-16-7081', 'residence': '550 Moore Locks\nSouth Andrea, SD 94842', 'current_location': (Decimal('-78.730969'), Decimal('-151.109875')), 'blood_group': 'B+', 'website': ['https://www.smith-avila.com/', 'http://bennett-scott.com/', 'https://www.nguyen.com/'], 'username': 'joseph04', 'name': 'Toni Martin', 'sex': 'F', 'address': '29676 Mann Rapid\nWilkinsonbury, MN 35916', 'mail': 'stash', 'birthdate': datetime.date(2016, 10, 1)}

解释：

在上面的示例中，我们再次导入所需的库并定义变量。然后，我们使用 profile 函数生成一个人的假文件并将其打印给用户。

现在，让我们使用faker库创建一个假数据集。

使用faker库创建一个假数据集

由于我们已经发现了大部分的函数并且已经在上一节中使用了 profile 函数，让我们尝试生成一个包含20个独特人物的假文件集。为了将这些档案存储到数据帧中，我们还将使用 pandas 库。

示例：

# importing the required libraries
from faker import Faker
import pandas as pd

# defining the variable for Faker() module
sample = Faker()

# generating the profiles of 20 people
mydata = [sample.profile() for n in range(20)]
my_dframe = pd.DataFrame(mydata)

print(my_dframe)

输出:

                                              job                    company  ...                       mail   birthdate
0                         Housing manager/officer                  Cross LLC  ...  stash  1983-03-26
1                       Learning disability nurse            Bennett-Sellers  ...      stash  1923-04-14
2                           Agricultural engineer                Patrick PLC  ...     stash  1941-01-13
3              Research scientist (life sciences)    Coleman, Shaw and Owens  ...    stash  1927-07-07
4                                   Haematologist           Jefferson-Bailey  ...     stash  2001-06-06
5   Chartered legal executive (England and Wales)            Torres-Andersen  ...      stash  1956-05-12
6                                    Statistician            Rodriguez-Chung  ...      stash  1955-07-06
7                                Paediatric nurse  Simmons, Acosta and Gates  ...  stash  1984-02-29
8                             Dispensing optician                  Bauer Inc  ...     stash  1935-03-30
9                  Equality and diversity officer  Martinez, Allen and Davis  ...     stash  2019-06-28
10                       Secondary school teacher  Greene, Gonzalez and Hill  ...    stash  1913-10-02
11                                   TEFL teacher             Smith and Sons  ...     stash  1989-06-17
12              Planning and development surveyor       Smith, Lee and Reyes  ...    stash  1905-09-05
13                               Product designer   Taylor, Davis and Wilson  ...       stash  1938-11-27
14                  Development worker, community              Carlson-Evans  ...          stash  1929-03-08
15                    Engineer, building services                 Pham Group  ...   stash  1984-12-31
16                       Therapist, horticultural          Anderson-Gonzalez  ...     stash  1929-03-16
17       Geographical information systems officer               Burke-Burton  ...          stash  1997-06-12
18                                 Retail manager               Rivera-Lucas  ...            stash  2016-03-20
19                       Therapeutic radiographer             Holloway Group  ...       stash  2011-02-23

[20 rows x 13 columns]

解释:

在上面的示例中，我们再次导入了所需的库并定义了一个变量。接下来，我们定义了包含20个人的个人资料的数据。最后，我们将这个数据转换为数据帧，并将其打印给用户。结果，生成的数据集存储了各种属性，如职位、公司、地点、电子邮件等。根据我们的需求，我们可以利用这个数据集。