7.13. Non-max suppression¶
A detection network typically produces several overlapping candidate boxes around the same real-world object: each anchor near the object fires with a similar score, and the post-processor sees them all. Non-max suppression (NMS) is the step that turns that cluster into one box.
The algorithm is short: sort the candidate boxes by score, take
the highest-scoring one, suppress every other box that overlaps
with it past a chosen threshold, then take the next highest from
what remains, and repeat. The overlap metric is
intersection-over-union (IoU) – the shared area of two boxes
divided by their combined area, a value between 0 (no overlap)
and 1 (identical boxes). The nms_threshold constructor
argument on every shipped post-processor is the cutoff above which
boxes are treated as duplicates of an already-kept box.
NMS collapses a cluster of overlapping detections to the highest-scoring one.¶
7.13.1. Soft-NMS¶
The shipped ml.utils.NMS class implements Soft-NMS, a
refinement that decays an overlapping box’s score by an amount
that depends on how much it overlaps, rather than dropping the
box outright. If the lowered score falls below the threshold the
box is dropped; otherwise it survives at the reduced score and
competes in the next round.
The nms_sigma parameter controls how aggressive the decay is.
With a small nms_sigma (the shipped default of 0.1) the
decay is steep: a heavily-overlapping box gets its score driven
to nearly zero and Soft-NMS reduces to classic NMS. With a larger
nms_sigma the decay is gentle and overlapping boxes of
different objects survive more often, which matters when
real-world objects of the same class genuinely overlap (a crowd
of faces, a cluster of palms).
Setting nms_sigma to <= 0 disables the decay entirely:
overlapping boxes pass through with their original scores, and
only the score threshold filters them.
7.13.2. Building one directly¶
Every shipped post-processor builds a fresh
NMS per inference, adds each candidate to it,
and calls get_bounding_boxes() at the end. A
custom post-processor follows the same pattern:
from ml.utils import NMS
iw = model.input_shape[0][2]
ih = model.input_shape[0][1]
nms = NMS(iw, ih, inputs[0].roi)
for box, score, class_idx in candidates:
nms.add_bounding_box(box.xmin, box.ymin,
box.xmax, box.ymax,
score, class_idx)
result = nms.get_bounding_boxes(threshold=nms_threshold,
sigma=nms_sigma)
The constructor takes the network’s input width and height in
pixels and the ROI of the original image the model ran against;
add_bounding_box() takes box coordinates in
that network-input pixel space, and
get_bounding_boxes() remaps the survivors back
into image coordinates using the ROI. The remap accounts for the
Normalization stretch
automatically – the same ROI the predictor saw is used to project
boxes back – so the returned boxes are ready to draw onto the
captured frame.
The return shape is a list of per-class lists, indexed by the
label_index passed to add_bounding_box. Empty class lists
are preserved so the index matches the model’s class index;
enumerate(result) walks the classes alongside their detections.