object_detection.builders.post_processing_builderbuild()方法在Python中的功能介绍

发布时间：2023-12-25 12:17:08

在Python中，object_detection.builders.post_processing_builder.build()函数用于通过设置参数构建一个对象检测后处理的图像处理管道。

该函数的参数可以被分为以下三类：batch_non_max_suppression_fn、score_converter_fn和nms_fn。

#### batch_non_max_suppression_fn参数

batch_non_max_suppression_fn参数是一个定义了在每个类别上对每个批次的预测框进行非极大值抑制的函数。该函数将一个列表的浮点数输入张量boxes（形状为[batch_size, num_boxes, 4]）和一个列表的浮点数输入张量scores（形状为[batch_size, num_boxes]）作为输入，并返回非极大值抑制的输出。具体而言，该函数的输入和输出张量的shape和dtype应该为：

  inputs:
    boxes: [batch_size, num_boxes, 4]
    scores: [batch_size, num_boxes]
  outputs:
    nms_boxes: [batch_size, max_num_detections, 4]
    nms_scores: [batch_size, max_num_detections]
    nms_classes: [batch_size, max_num_detections]

以下是一个示例函数：

def batch_non_max_suppression_fn(boxes, scores):
    nmsed_boxes, nmsed_scores, nmsed_classes, _ = tf.image.combined_non_max_suppression(
        boxes,
        scores,
        max_output_size_per_class=NUM_DETECTIONS,
        max_total_size=NUM_DETECTIONS,
        iou_threshold=nms_iou_threshold,
       
    )
    return nmsed_boxes, nmsed_scores, nmsed_classes

#### score_converter_fn参数

score_converter_fn参数是一个将原始预测框得分转换成保留或丢弃的二值标志的函数。该函数将一个包含原始分数的浮点数输入张量raw_scores（形状为[batch_size, num_boxes, num_classes]）作为输入，并返回对应的一个二值张量score_converter_fn（形状为[batch_size, num_boxes, num_classes]）。以下是一个示例函数：

def score_converter_fn(raw_scores):
    return tf.where(tf.greater_equal(raw_scores, score_threshold), tf.ones_like(raw_scores), tf.zeros_like(raw_scores))

在上述示例中，如果原始分数大于等于阈值score_threshold，则对应的输出张量元素被设置为1，否则被设置为0。

#### nms_fn参数

nms_fn参数是一个根据预测框的类别对其进行非极大值抑制的函数。该函数将一个包含原始预测框的浮点数输入张量boxes(形状为[batch_size, num_boxes, 4])，一个包含预测框类别识别概率的浮点数输入张量class_scores（形状为[batch_size, num_boxes, num_classes]），和一个浮点数输入张量class_agnostic_boxes（形状为[batch_size, num_boxes, 4]）作为输入，并返回非极大值抑制后的预测框。具体而言，该函数的输入和输出张量的shape和dtype应该为：

  inputs:
    boxes: [batch_size, num_boxes, 4]
    class_scores: [batch_size, num_boxes, num_classes]
    class_agnostic_boxes: [batch_size, num_boxes, 4]
  outputs:
    nmsed_boxes: [batch_size, max_num_detections, 4]
    nmsed_scores: [batch_size, max_num_detections]
    nmsed_classes: [batch_size, max_num_detections]

以下是一个示例函数：

def nms_fn(boxes, class_scores, class_agnostic_boxes):
    scores = tf.reduce_max(class_scores, axis=2)
    indices = tf.argmax(class_scores, axis=2)
    class_indices = tf.cast(tf.expand_dims(indices, axis=2), tf.float32)
    nmsed_boxes, nmsed_scores, nmsed_classes, _ = tf.image.combined_non_max_suppression(
        tf.concat([boxes, class_agnostic_boxes], axis=2),
        scores,
        max_output_size_per_class=NUM_DETECTIONS,
        max_total_size=NUM_DETECTIONS,
        iou_threshold=nms_iou_threshold
    )
    nmsed_indices_float = tf.expand_dims(tf.argmax(nmsed_scores, axis=2), axis=2)
    nmsed_indices = tf.cast(nmsed_indices_float, tf.int32)
    nmsed_class_indices = tf.cast(tf.gather_nd(indices, nmsed_indices), tf.float32)
    return nmsed_boxes, nmsed_scores, nmsed_class_indices

在上述示例中，函数首先通过tf.reduce_max和tf.argmax获取每个预测框的最高类别得分和对应的类别索引。然后，它在tf.concat中将boxes和class_agnostic_boxes组合到一起，之后使用tf.image.combined_non_max_suppression进行非极大值抑制。最后，使用tf.gather_nd选择每个预测框的类别索引。

通过设置上述函数，可以构建一个对象检测的后处理图像处理管道。以下是一个完整的示例代码：

import tensorflow as tf
from object_detection.builders import post_processing_builder

NUM_CLASSES = 10

def batch_non_max_suppression_fn(boxes, scores):
    nmsed_boxes, nmsed_scores, nmsed_classes, _ = tf.image.combined_non_max_suppression(
        boxes,
        scores,
        max_output_size_per_class=100,
        max_total_size=100,
        iou_threshold=0.5,
    )
    return nmsed_boxes, nmsed_scores, nmsed_classes

def score_converter_fn(raw_scores):
    return tf.where(tf.greater_equal(raw_scores, 0.5), tf.ones_like(raw_scores), tf.zeros_like(raw_scores))

def nms_fn(boxes, class_scores, class_agnostic_boxes):
    scores = tf.reduce_max(class_scores, axis=2)
    indices = tf.argmax(class_scores, axis=2)
    class_indices = tf.cast(tf.expand_dims(indices, axis=2), tf.float32)
    nmsed_boxes, nmsed_scores, _, _ = tf.image.combined_non_max_suppression(
        tf.concat([boxes, class_agnostic_boxes], axis=2),
        scores,
        max_output_size_per_class=100,
        max_total_size=100,
        iou_threshold=0.5
    )
    nmsed_indices_float = tf.expand_dims(tf.argmax(nmsed_scores, axis=2), axis=2)
    nmsed_indices = tf.cast(nmsed_indices_float, tf.int32)
    nmsed_class_indices = tf.cast(tf.gather_nd(indices, nmsed_indices), tf.float32)
    return nmsed_boxes, nmsed_scores, nmsed_class_indices

# 构建后处理图像处理管道
post_processing = post_processing_builder.build(
    batch_non_max_suppression_fn=batch_non_max_suppression_fn,
    score_converter_fn=score_converter_fn,
    nms_fn=nms_fn
)

上述代码中，定义了batch_non_max_suppression_fn、score_converter_fn和nms_fn三个函数，并使用这些函数调用了post_processing_builder.build()函数构建了一个后处理图像处理管道。最后，将返回的post_processing对象用于对象检测任务中的后处理步骤。

以上是object_detection.builders.post_processing_builder.build()方法在Python中的功能介绍和一个使用示例。这个方法可以根据自己的需求设置不同的函数来构建自定义的对象检测后处理图像处理管道。