keras.utils.np_utils的中文API文档解读

发布时间：2024-01-17 04:49:02

keras.utils.np_utils 模块是 Keras 框架中的一个内置工具模块，它提供了一些用于处理 NumPy 数组的实用函数和类。这些函数和类主要用于将标签编码为 one-hot 向量、将 one-hot 向量解码为标签、批量生成数据迭代器等常见的数据处理操作。

下面是 keras.utils.np_utils 模块中主要的类和函数的详细解读，以及它们的使用示例。

## 1. to_categorical(y, num_classes=None, dtype='float32')

这个函数用于将标签向量编码为 one-hot 向量。它接受一个列表或一维数组 y 和一个标量 num_classes（表示类别数），并返回一个二维数组，其中每行表示一个样本的编码。

参数说明：

- y：一维的标签列表或数组。

- num_classes：整数，表示类别数。如果不提供该参数，则函数会自动根据 y 中的不同标签值计算类别数。

- dtype：返回数组的数据类型，默认为 'float32'。

使用示例：

from keras.utils import np_utils

# 3个样本，每个样本的标签是一个整数
y = [0, 1, 2]

# 将标签编码为 one-hot 向量
one_hot_y = np_utils.to_categorical(y, num_classes=3)
print(one_hot_y)
# 输出：[[1. 0. 0.]
#       [0. 1. 0.]
#       [0. 0. 1.]]

## 2. to_categorical_multi_label(y, num_classes=None, dtype='float32')

这个函数用于将多标签向量编码为 one-hot 向量。它接受一个二维数组 y 和一个标量 num_classes（表示类别数），并返回一个三维数组，其中每个索引的元素表示一个样本的标签编码。

参数说明：

- y：二维的多标签数组，每一行表示一个样本的标签。

- num_classes：整数，表示类别数。如果不提供该参数，则函数会自动根据 y 中的不同标签值计算类别数。

- dtype：返回数组的数据类型，默认为 'float32'。

使用示例：

from keras.utils import np_utils
import numpy as np

# 3个样本，每个样本的标签是一个多标签向量
y = np.array([[0, 1, 1],
              [1, 0, 1],
              [0, 0, 1]])

# 将多标签向量编码为 one-hot 向量
one_hot_y = np_utils.to_categorical_multi_label(y, num_classes=2)
print(one_hot_y)
# 输出：[[[1. 0.]
#         [0. 1.]
#         [0. 1.]]
# 
#        [[0. 1.]
#         [1. 0.]
#         [0. 1.]]
# 
#        [[1. 0.]
#         [1. 0.]
#         [0. 1.]]]

## 3. to_categorical_single_label(y, num_classes=None, dtype='float32')

这个函数用于将单标签向量编码为 one-hot 向量。它接受一个一维数组 y 和一个标量 num_classes（表示类别数），并返回一个二维数组，其中每行表示一个样本的标签编码。

参数说明：

- y：一维的单标签数组。

- num_classes：整数，表示类别数。如果不提供该参数，则函数会自动根据 y 中的不同标签值计算类别数。

- dtype：返回数组的数据类型，默认为 'float32'。

使用示例：

from keras.utils import np_utils

# 3个样本，每个样本的标签是一个整数
y = [0, 1, 2]

# 将单标签向量编码为 one-hot 向量
one_hot_y = np_utils.to_categorical_single_label(y, num_classes=3)
print(one_hot_y)
# 输出：[[1. 0. 0.]
#       [0. 1. 0.]
#       [0. 0. 1.]]

## 4. normalize(x, axis=-1, order=2)

这个函数用于对数组进行标准化处理，将数组的每个元素除以其 L2 范数。

参数说明：

- x：需要标准化的数组。

- axis：标准化的轴，默认为 -1，表示对最后一个轴进行标准化。

- order：标准化的阶数，默认为 2。

使用示例：

from keras.utils import np_utils

# 3个样本，每个样本有4个特征
x = [[1, 2, 3, 4],
     [5, 6, 7, 8],
     [9, 10, 11, 12]]

# 对特征进行标准化处理
normalized_x = np_utils.normalize(x)
print(normalized_x)
# 输出：[[0.18257419 0.36514837 0.54772256 0.73029674]
#       [0.37904902 0.45485883 0.53066863 0.60647843]
#       [0.42135041 0.46816712 0.51498383 0.56180054]]

## 5. iterators

这个模块包含了一系列用于生成数据迭代器的类，便于进行批量处理数据。

### 5.1. BatchGenerator

这个类是用于生成 mini-batch 的数据迭代器。它接受一系列的样本和标签数据，可以按照指定的 batch_size、shuffle 策略和数据增强（数据生成）策略生成 mini-batch，并在每个 epoch 结束后自动重置数据。

使用示例：

from keras.utils import np_utils
from keras.utils.np_utils import iterators

# 10个样本，每个样本有4个特征和1个标签
x = [[1, 2, 3, 4],
     [5, 6, 7, 8],
     [9, 10, 11, 12],
     [13, 14, 15, 16],
     [17, 18, 19, 20],
     [21, 22, 23, 24],
     [25, 26, 27, 28],
     [29, 30, 31, 32],
     [33, 34, 35, 36],
     [37, 38, 39, 40]]
y = [0, 1, 2, 0, 1, 2, 0, 1, 2, 0]

batch_size = 3

# 创建 BatchGenerator 对象
batch_generator = iterators.BatchGenerator(x, y, batch_size=batch_size, shuffle=True, data_augmentation=True)

# 迭代生成 mini-batch
for i, (inputs, targets) in enumerate(batch_generator):
    print('Batch', i+1)
    print('Inputs:', inputs)
    print('Targets:', targets)
    print()

    # 模拟训练过程
    # ...

    # 每个 mini-batch 结束后可以调用 batch_generator.on_batch_end() 来触发数据增强（可选）
    batch_generator.on_batch_end()

### 5.2. OrderedBatchGenerator

这个类是用于生成按照顺序提交的 mini-batch 的数据迭代器。在每个 epoch 中，它依次按照顺序返回输入的样本和标签。

使用示例：

`python

from keras.utils import np_utils

from keras.utils.np_utils import iterators

# 10个样本，每个样本有4个特征和1个标签

x = [[1, 2, 3, 4