Matplotlib散点图：如何根据数值设置颜色|极客笔记

Matplotlib散点图：如何根据数值设置颜色

Matplotlib是Python中最流行的数据可视化库之一，它提供了丰富的绘图功能，其中散点图（scatter plot）是一种常用的可视化方式。在数据分析和科学研究中，我们经常需要根据数据点的某个属性或值来设置散点图中点的颜色，以便更直观地展示数据的分布和特征。本文将详细介绍如何使用Matplotlib创建散点图，并根据数值设置颜色，同时提供多个实用的示例代码。

1. Matplotlib散点图基础

在深入探讨如何根据数值设置颜色之前，我们先来回顾一下Matplotlib散点图的基础知识。散点图是一种二维图表，用于显示两个变量之间的关系。每个点的位置由其x和y坐标决定，而点的其他属性（如颜色、大小、形状等）可以用来表示额外的信息。

以下是一个简单的散点图示例：

import matplotlib.pyplot as plt
import numpy as np

# 生成示例数据
x = np.random.rand(50)
y = np.random.rand(50)

# 创建散点图
plt.figure(figsize=(8, 6))
plt.scatter(x, y)
plt.title('Basic Scatter Plot - how2matplotlib.com')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

Output:

Matplotlib散点图：如何根据数值设置颜色

这个例子展示了如何创建一个基本的散点图。我们使用numpy生成随机数据，然后使用plt.scatter()函数绘制散点图。

2. 使用单一颜色设置散点图

在开始根据数值设置颜色之前，让我们先看看如何为所有点设置相同的颜色：

import matplotlib.pyplot as plt
import numpy as np

x = np.random.rand(50)
y = np.random.rand(50)

plt.figure(figsize=(8, 6))
plt.scatter(x, y, color='red')
plt.title('Red Scatter Plot - how2matplotlib.com')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

Output:

Matplotlib散点图：如何根据数值设置颜色

在这个例子中，我们通过color参数将所有点的颜色设置为红色。你可以使用任何有效的颜色名称或RGB值。

3. 根据数值设置颜色：使用colormap

现在，让我们开始探讨如何根据数值设置颜色。Matplotlib提供了多种颜色映射（colormap），可以将数值映射到颜色空间。以下是一个使用colormap的示例：

import matplotlib.pyplot as plt
import numpy as np

x = np.random.rand(100)
y = np.random.rand(100)
colors = np.random.rand(100)

plt.figure(figsize=(10, 8))
scatter = plt.scatter(x, y, c=colors, cmap='viridis')
plt.colorbar(scatter)
plt.title('Scatter Plot with Colormap - how2matplotlib.com')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

Output:

Matplotlib散点图：如何根据数值设置颜色

在这个例子中：
– 我们创建了一个额外的colors数组，用于存储每个点的颜色值。
– 在plt.scatter()函数中，我们使用c参数传入colors数组，并通过cmap参数指定颜色映射。
– plt.colorbar()函数添加了一个颜色条，显示颜色与数值的对应关系。

4. 自定义颜色映射

Matplotlib提供了多种内置的颜色映射，但有时我们可能需要自定义颜色映射以满足特定需求。以下是一个创建自定义颜色映射的示例：

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import LinearSegmentedColormap

x = np.random.rand(100)
y = np.random.rand(100)
values = np.random.rand(100)

# 创建自定义颜色映射
colors = ['blue', 'green', 'red']
n_bins = 100
cmap = LinearSegmentedColormap.from_list('custom_cmap', colors, N=n_bins)

plt.figure(figsize=(10, 8))
scatter = plt.scatter(x, y, c=values, cmap=cmap)
plt.colorbar(scatter)
plt.title('Scatter Plot with Custom Colormap - how2matplotlib.com')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

Output:

Matplotlib散点图：如何根据数值设置颜色

在这个例子中，我们使用LinearSegmentedColormap.from_list()函数创建了一个从蓝色到绿色再到红色的自定义颜色映射。这允许我们更精确地控制颜色的分布。

5. 使用离散颜色

有时，我们可能希望使用离散的颜色而不是连续的颜色映射。这在处理分类数据时特别有用。以下是一个使用离散颜色的示例：

import matplotlib.pyplot as plt
import numpy as np

x = np.random.rand(100)
y = np.random.rand(100)
categories = np.random.randint(0, 3, 100)

colors = ['red', 'green', 'blue']

plt.figure(figsize=(10, 8))
for i, color in enumerate(colors):
    mask = categories == i
    plt.scatter(x[mask], y[mask], c=color, label=f'Category {i}')

plt.title('Scatter Plot with Discrete Colors - how2matplotlib.com')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.show()

Output:

Matplotlib散点图：如何根据数值设置颜色

在这个例子中，我们为每个类别分配了一个固定的颜色，并使用循环来绘制不同类别的点。

6. 根据数值范围设置颜色

有时，我们可能希望根据数值的范围来设置颜色。以下是一个根据数值范围设置颜色的示例：

import matplotlib.pyplot as plt
import numpy as np

x = np.random.rand(100)
y = np.random.rand(100)
values = np.random.rand(100)

plt.figure(figsize=(10, 8))
scatter = plt.scatter(x, y, c=values, cmap='coolwarm', vmin=0, vmax=1)
plt.colorbar(scatter)
plt.title('Scatter Plot with Color Range - how2matplotlib.com')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

Output:

Matplotlib散点图：如何根据数值设置颜色

在这个例子中，我们使用vmin和vmax参数来设置颜色映射的范围。这确保了颜色映射覆盖了我们感兴趣的数值范围。

7. 使用透明度表示数值

除了颜色，我们还可以使用透明度来表示数值。这在处理密集数据或需要突出显示某些点时特别有用：

import matplotlib.pyplot as plt
import numpy as np

x = np.random.rand(1000)
y = np.random.rand(1000)
values = np.random.rand(1000)

plt.figure(figsize=(10, 8))
scatter = plt.scatter(x, y, c='blue', alpha=values)
plt.colorbar(scatter)
plt.title('Scatter Plot with Transparency - how2matplotlib.com')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

Output:

Matplotlib散点图：如何根据数值设置颜色

在这个例子中，我们使用alpha参数来设置点的透明度。透明度值越高，点越不透明。

8. 组合颜色和大小

我们可以同时使用颜色和点的大小来表示不同的数值，从而在一个图中展示更多信息：

import matplotlib.pyplot as plt
import numpy as np

x = np.random.rand(100)
y = np.random.rand(100)
colors = np.random.rand(100)
sizes = np.random.randint(20, 200, 100)

plt.figure(figsize=(10, 8))
scatter = plt.scatter(x, y, c=colors, s=sizes, cmap='viridis', alpha=0.7)
plt.colorbar(scatter)
plt.title('Scatter Plot with Color and Size - how2matplotlib.com')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

Output:

Matplotlib散点图：如何根据数值设置颜色

在这个例子中，我们使用c参数设置颜色，使用s参数设置点的大小。这允许我们同时展示两个不同的数值维度。

9. 使用颜色循环

对于某些类型的数据，使用预定义的颜色循环可能更合适。Matplotlib提供了多种颜色循环，我们可以根据需要选择：

import matplotlib.pyplot as plt
import numpy as np

x = np.random.rand(50)
y = np.random.rand(50)
categories = np.random.randint(0, 5, 50)

plt.figure(figsize=(10, 8))
for category in np.unique(categories):
    mask = categories == category
    plt.scatter(x[mask], y[mask], label=f'Category {category}')

plt.title('Scatter Plot with Color Cycle - how2matplotlib.com')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.show()

Output:

Matplotlib散点图：如何根据数值设置颜色

在这个例子中，我们使用Matplotlib的默认颜色循环为不同的类别自动分配颜色。

10. 使用标记样式和颜色

除了颜色，我们还可以使用不同的标记样式来区分数据点：

import matplotlib.pyplot as plt
import numpy as np

x = np.random.rand(150)
y = np.random.rand(150)
categories = np.random.randint(0, 3, 150)

markers = ['o', 's', '^']
colors = ['red', 'green', 'blue']

plt.figure(figsize=(10, 8))
for category, marker, color in zip(range(3), markers, colors):
    mask = categories == category
    plt.scatter(x[mask], y[mask], marker=marker, c=color, label=f'Category {category}')

plt.title('Scatter Plot with Markers and Colors - how2matplotlib.com')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.show()

Output:

Matplotlib散点图：如何根据数值设置颜色

在这个例子中，我们为每个类别分配了不同的标记样式和颜色，使得数据点更容易区分。

11. 使用颜色渐变

有时，我们可能希望使用颜色渐变来表示数值的变化。以下是一个使用颜色渐变的示例：

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)
colors = x

plt.figure(figsize=(10, 6))
scatter = plt.scatter(x, y, c=colors, cmap='plasma')
plt.colorbar(scatter)
plt.title('Scatter Plot with Color Gradient - how2matplotlib.com')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

Output:

Matplotlib散点图：如何根据数值设置颜色

在这个例子中，我们使用x值作为颜色值，创建了一个沿x轴变化的颜色渐变。

12. 使用双色映射

对于某些数据集，使用双色映射可能更有助于突出显示正负值或其他对比：

import matplotlib.pyplot as plt
import numpy as np

x = np.random.rand(100) * 2 - 1
y = np.random.rand(100) * 2 - 1
values = x * y

plt.figure(figsize=(10, 8))
scatter = plt.scatter(x, y, c=values, cmap='RdYlBu', vmin=-1, vmax=1)
plt.colorbar(scatter)
plt.title('Scatter Plot with Diverging Colormap - how2matplotlib.com')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

Output:

Matplotlib散点图：如何根据数值设置颜色

在这个例子中，我们使用’RdYlBu’（红-黄-蓝）颜色映射来突出显示正值和负值。

13. 使用离散颜色映射

有时，我们可能希望使用离散的颜色映射，而不是连续的颜色映射：

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import BoundaryNorm, ListedColormap

x = np.random.rand(1000)
y = np.random.rand(1000)
values = np.random.randint(0, 5, 1000)

cmap = ListedColormap(['red', 'green', 'blue', 'yellow', 'purple'])
norm = BoundaryNorm(np.arange(-0.5, 5.5, 1), cmap.N)

plt.figure(figsize=(10, 8))
scatter = plt.scatter(x, y, c=values, cmap=cmap, norm=norm)
plt.colorbar(scatter, ticks=np.arange(0, 5))
plt.title('Scatter Plot with Discrete Colormap - how2matplotlib.com')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

Output:

Matplotlib散点图：如何根据数值设置颜色

在这个例子中，我们创建了一个自定义的离散颜色映射，并使用BoundaryNorm来定义颜色边界。

14. 使用颜色编码表示多个变量

我们可以使用RGB颜色空间来同时表示三个不同的变量：

import matplotlib.pyplot as plt
import numpy as np

x = np.random.rand(100)
y = np.random.rand(100)
r = np.random.rand(100)
g = np.random.rand(100)
b = np.random.rand(100)

colors = np.array([r, g, b]).T

plt.figure(figsize=(10, 8))
plt.scatter(x, y, c=colors)
plt.title('Scatter Plot with RGB Color Encoding - how2matplotlib.com')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

Output:

Matplotlib散点图：如何根据数值设置颜色

在这个例子中，我们使用RGB颜色空间来同时表示三个不同的变量。每个点的颜色由三个独立的值决定，分别对应红、绿、蓝通道。

15. 使用颜色编码表示时间序列

对于时间序列数据，我们可以使用颜色来表示时间的流逝：

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import Normalize
from matplotlib.cm import ScalarMappable

np.random.seed(42)
t = np.linspace(0, 10, 100)
x = np.cumsum(np.random.randn(100))
y = np.cumsum(np.random.randn(100))

fig, ax = plt.subplots(figsize=(10, 8))
scatter = ax.scatter(x, y, c=t, cmap='viridis')
ax.set_title('Time Series Scatter Plot - how2matplotlib.com')
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')

norm = Normalize(vmin=t.min(), vmax=t.max())
sm = ScalarMappable(cmap='viridis', norm=norm)
sm.set_array([])
cbar = fig.colorbar(sm)
cbar.set_label('Time')

plt.show()

在这个例子中，我们使用颜色来表示时间的流逝。点的颜色从深到浅表示时间从早到晚。

16. 使用颜色编码表示数据密度

对于大量重叠的数据点，我们可以使用颜色来表示数据密度：

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde

# 生成示例数据
np.random.seed(42)
x = np.random.normal(0, 1, 1000)
y = x * 0.5 + np.random.normal(0, 1, 1000)

# 计算点密度
xy = np.vstack([x, y])
z = gaussian_kde(xy)(xy)

fig, ax = plt.subplots(figsize=(10, 8))
scatter = ax.scatter(x, y, c=z, s=50, alpha=0.5, cmap='viridis')
ax.set_title('Density-based Scatter Plot - how2matplotlib.com')
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')

cbar = fig.colorbar(scatter)
cbar.set_label('Density')

plt.show()

Output:

Matplotlib散点图：如何根据数值设置颜色

在这个例子中，我们使用scipy.stats.gaussian_kde函数来估计数据点的密度，然后使用颜色来表示这个密度。这种方法可以帮助我们在大量数据点重叠的情况下更好地理解数据分布。

17. 使用颜色编码表示聚类结果

在进行数据聚类后，我们可以使用颜色来表示不同的聚类：

import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import KMeans

# 生成示例数据
np.random.seed(42)
x = np.concatenate([np.random.normal(0, 1, 300), np.random.normal(4, 1, 300)])
y = np.concatenate([np.random.normal(0, 1, 300), np.random.normal(4, 1, 300)])

# 进行K-means聚类
kmeans = KMeans(n_clusters=2, random_state=42)
labels = kmeans.fit_predict(np.column_stack((x, y)))

fig, ax = plt.subplots(figsize=(10, 8))
scatter = ax.scatter(x, y, c=labels, cmap='viridis')
ax.set_title('Cluster-based Scatter Plot - how2matplotlib.com')
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')

cbar = fig.colorbar(scatter, ticks=[0, 1])
cbar.set_label('Cluster')

plt.show()

Output:

Matplotlib散点图：如何根据数值设置颜色

在这个例子中，我们使用K-means算法对数据进行聚类，然后使用颜色来表示不同的聚类。这种方法可以帮助我们直观地看到数据的分组情况。

18. 使用颜色编码表示预测误差

在机器学习模型的评估中，我们可以使用颜色来表示预测误差的大小：

import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression

# 生成示例数据
np.random.seed(42)
x = np.linspace(0, 10, 100)
y = 2 * x + 1 + np.random.normal(0, 2, 100)

# 拟合线性回归模型
model = LinearRegression()
model.fit(x.reshape(-1, 1), y)
y_pred = model.predict(x.reshape(-1, 1))

# 计算误差
errors = np.abs(y - y_pred)

fig, ax = plt.subplots(figsize=(10, 8))
scatter = ax.scatter(x, y, c=errors, cmap='YlOrRd')
ax.plot(x, y_pred, color='blue', label='Regression Line')
ax.set_title('Error-based Scatter Plot - how2matplotlib.com')
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
ax.legend()

cbar = fig.colorbar(scatter)
cbar.set_label('Absolute Error')

plt.show()

Output:

Matplotlib散点图：如何根据数值设置颜色