Keras中的to_categorical()函数：实现标签独热编码的简便方法

发布时间：2023-12-17 09:32:21

在机器学习中，我们经常需要将标签进行独热编码（one-hot encoding）。独热编码是将离散特征的每个可能取值转换为一个二进制向量，只有一个元素为1，其余都为0。在Keras中，我们可以使用to_categorical()函数来实现标签的独热编码，这是一种简便的方法。

to_categorical()函数是Keras的一个辅助函数，可以将整型向量转换为二进制(class)矩阵。函数可以有两种用法：

1. 使用默认参数

to_categorical(y, num_classes=None, dtype='float32')

参数说明：

- y：代表类别的整型向量，可以是任意长度的一维或二维的整型或浮点型向量。

- num_classes：表示输出的维度，如果未给定，则根据输入数据自动推断。

- dtype：输出数据的类型，默认为'float32'。

返回值：

- 一个二维的矩阵表示独热编码后的标签。

使用默认参数时，to_categorical()函数会根据输入的整型向量y自动推断输出维度，然后将其转换为一个二进制矩阵。

下面是一个使用默认参数的例子：

import numpy as np
from keras.utils import to_categorical

# 输入数据
y = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# 使用to_categorical()进行独热编码
one_hot = to_categorical(y)
print(one_hot)

# 输出结果
# [[ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
#  [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
#  [ 0.  0.  1.  0.  0.  0.  0.  0.  0.  0.]
#  [ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
#  [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]
#  [ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
#  [ 0.  0.  0.  0.  0.  0.  1.  0.  0.  0.]
#  [ 0.  0.  0.  0.  0.  0.  0.  1.  0.  0.]
#  [ 0.  0.  0.  0.  0.  0.  0.  0.  1.  0.]
#  [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]]

在这个例子中，我们先创建了一个包含整数0到9的一维数组。然后，使用to_categorical()函数对数组进行独热编码，将其转换为一个二维的矩阵。每个整数都被转换为一个只有一个元素为1的二进制矩阵。

2. 指定输出维度

to_categorical(y, num_classes=10, dtype='float32')

在某些情况下，我们可能需要手动指定输出矩阵的维度。例如，如果我们的标签中有20个类别，但是to_categorical()函数默认将输出矩阵的维度设置为10，那么我们就需要手动指定维度为20。

下面是一个手动指定输出维度的例子：

`python

import numpy as np

from keras.utils import to_categorical

# 输入数据

y = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

# 使用to_categorical()进行独热编码

one_hot = to_categorical(y, num_classes=12)

print(one_hot)

# 输出结果

# [[ 1. 0. 0. 0. 0. 0. 0. 0. 0.