Python 在Pandas Dataframe中生成随机整数

使用Python的Pandas库在DataFrame中生成随机整数是一种重要的数据分析和操作技术。通过开发和插入随机整数到DataFrame中，你可以为各种应用程序打开一扇大门。这个功能在数据模拟、算法测试和生成合成数据集等任务中特别有价值。熟悉这个特性无疑会增强你的数据分析工作流程的灵活性和多样性。

方法1：使用NumPy的randint()函数

在这个特定的代码片段中，常用的NumPy库中的randint()函数用于生成指定范围内的随机整数。

在这个程序中，我们确定了一个类似于表格的结构称为DataFrame的所需大小，以生成指定范围内的随机整数。最后，我们通过整合这些随机生成的数字来构造DataFrame。

步骤

步骤1 - 导入pandas和numpy库

步骤2 - 创建一个变量”row和cols”来设置DataFrame的行和列数

步骤3 - 使用numpy.random.randint()函数来创建一定范围内的随机整数

步骤4 - 使用变量”data”中的随机整数创建数据帧”df”

步骤5 - 打印”df”

示例

import pandas as pd
import numpy as np

row = 5
cols = 5

Random = np.random.randint(low=0, high=100, size=(row, cols))

df = pd.DataFrame(Random)

print(df)

结果

0   1   2   3   4
0  92   5  54   9  32
1  64  12  21  16  98
2  29  36  91  95  74
3   4  10  46  25   8
4  84  24  21  27   9

方法2：使用pandas.DataFrame.sample()方法

sample()方法用于从DataFrame中获取一个随机样本。

在提供的代码片段中，建立了一个名为’df’的DataFrame，包含5行和3列（’A’ ‘B’ ‘C’）。随后，使用sample()方法基于它们的各自样本选择和分配了新的值给’A’、’B’和’C’列。样本大小设置为5，replace=True允许进行带有替换的抽样，random_state=42设定了随机种子以便可重现性。最后，显示更新后的DataFrame。

示例

import pandas as pd
import numpy as np

# Set the seed for reproducibility (optional)
np.random.seed(42)

# declare a variable with rows and columns size & name
df = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=['A', 'B', 'C'])

# Generating random number using sample()
df['A'] = df['A'].sample(n=5, replace=True, random_state=42).values
df['B'] = df['B'].sample(n=5, replace=True, random_state=42).values
df['C'] = df['C'].sample(n=5, replace=True, random_state=42).values

print(df)

输出

方法3：使用pandas.DataFrame.apply()方法和lambda函数。

下面提供的代码利用pandas.DataFrame.apply()方法和lambda函数生成随机整数，并将它们分配到Pandas DataFrame的列中。形成一个名为df的DataFrame，包含5行和3列。通过使用apply()应用lambda函数，为每一行生成从0到9的随机整数。然后将这些随机生成的整数分配给它们所对应的列，即’RandomA’、’RandomB’和’RandomC’。最后，打印出数据框以展示生成的随机整数。

步骤

步骤1 - 导入pandas库和random模块。

步骤2 - 设置随机种子为42，以便重现性(可选)。

步骤3 - 创建一个包含5行和3列的DataFrame，命名为’RandomA’、’RandomB’和’RandomC’。

步骤4 - 使用apply()函数和lambda函数为每一列生成0到9之间的随机整数。

步骤5 - 将生成的随机值分配给DataFrame中相应的列。

步骤6 - 打印出DataFrame。

示例

import pandas as pd
import random

# Set the seed for reproducibility (optional)
random.seed(42)

# Create a data frame with 5 rows and 3 columns containing random integers between 0 and 9
df = pd.DataFrame(index=range(5), columns=['RandomA', 'RandomB', 'RandomC'])

# Generate random integers using apply() and a lambda function
df['RandomA'] = df.apply(lambda _: random.randint(0, 9), axis=1)
df['RandomB'] = df.apply(lambda _: random.randint(0, 9), axis=1)
df['RandomC'] = df.apply(lambda _: random.randint(0, 9), axis=1)

print(df)

输出

RandomA  RandomB  RandomC
0        1        2        6
1        0        1        0
2        4        8        0
3        3        1        1
4        3        9        3

方法4：使用pandas.Series.apply()函数

pandas.Series.apply()函数是pandas库中一种有价值的方法。它使得可以将自定义的函数应用于Series对象中的每个元素。

在提供的代码片段中，使用了嵌套的列表推导来构建一个DataFrame。为了产生从0到100范围内的随机整数，生成随机整数的函数generate_random_int()与apply()函数一同发挥作用。这种组合可以为DataFrame中的每个元素生成不同的随机数。因此，得到了一个完全由随机生成的整数组成的DataFrame。最后，将这个结果DataFrame打印出来，以便进行进一步的分析或利用。

步骤

步骤1 - 导入所需的库：pandas用于数据操作，random用于生成随机整数。

步骤2 - 声明DataFrame的行数和列数。

步骤3 - 定义一个函数来生成0到100之间的随机整数。

步骤4 - 使用嵌套的列表推导来创建一个DataFrame，为每个单元格生成随机整数。

步骤5 - 打印DataFrame，显示生成的随机整数。

步骤6 - 结束程序。

示例

import pandas as pd
import random
#setting the number of rows and columns for data frame
num_rows = 10
num_cols = 5
#defining the function for generating random numbers
def generate_random_int():
   return random.randint(0, 100)
#creating a variable to store a random number in data frame
df = pd.DataFrame([[generate_random_int() for _ in range(num_cols)] for _ in range(num_rows)])

print(df)

输出

0    1   2   3   4
0  23   77  66  60  19
1  51   31  79  51  88
2   6   38  73  38  64
3   5   79  97  25  43
4  24   53   6  23   6
5  63   82  47  56  10
6  72   91   4  84  32
7  81   74  17  21  44
8  28  100  43  31  58
9  64   57  16  15  14

结论

就在创建Pandas数据框中的随机整数而言，存在着许多方法。常用的选项包括randint()函数和pandas.DataFrame.sample()、pandas.DataFrame.apply()和pandas.Series.apply()。然而，每种方法都有其优势。确定最佳方法取决于具体的使用情况。如果目标是在数据框列中直接生成随机整数，那么randint()函数将是一个理想的选择。

另一方面，如果随机抽样行更为相关，则适用sample()。对于涉及随机整数的更复杂操作的情况，可以有效地使用apply()函数。