7.25. Finding blobs

Thresholding turned the captured frame into a binary mask: every pixel either passes the threshold test or it doesn’t. That answers which colours the application cares about appear in the scene, but not where – the mask is just a sea of 1s and 0s. The next step is blob detection: walking the mask, finding contiguous regions of passing pixels, and returning each one as an object with a position, a size, an orientation, and the other properties an application can act on.

find_blobs() is the work-horse method for that step, and it is the most common entry point into the image module’s result-object world. Tracking a coloured ball, following a line painted on the floor, counting how many bright spots a thermal sensor sees, deciding whether a blue LED is on or off – the same call covers all of them. The inputs change (the thresholds, the region searched, the filters applied to the result), but the call pattern is the same.

7.25.1. The basic call

find_blobs takes a list of thresholds and returns a list of blob result objects:

thresholds = [(30, 100, 15, 127, 15, 127)]  # LAB threshold for red
blobs = img.find_blobs(thresholds)

for b in blobs:
    img.draw_rectangle(b.rect, color=(255, 0, 0))
    img.draw_cross(b.cx, b.cy, color=(255, 0, 0))

Each threshold tuple has the same form as the thresholds passed to binary() – six entries (l_lo, l_hi, a_lo, a_hi, b_lo, b_hi) for an RGB565 image (the bounds are in LAB), two entries (lo, hi) for a grayscale image. Up to 32 thresholds can be supplied in a single call, which is what makes find_blobs() so flexible: red, green, and blue beacons can be tracked simultaneously, each contributing its own blobs to the returned list, and each blob’s code property identifies which threshold it matched.

The draw_rectangle() and draw_cross() calls above annotate the captured frame for the IDE preview. The blob result already carries b.rect (the bounding box as a 4-tuple) and b.cx / b.cy (the integer centroid), so drawing the detection back into the frame is two method calls.

7.25.2. What the result contains

Each Blob is an attribute-tuple that packs together everything the detector measured about the region. The properties divide into four groups.

The bounding-box and centroid group – x, y, w, h, rect, cx, cy, cxf, cyf – describes the position of the blob. rect is the (x, y, w, h) 4-tuple that drawing methods expect; cx and cy are the centroid in integer pixel coordinates; cxf and cyf are the centroid in sub-pixel float coordinates, useful when an upstream calibration cares about fractional positions.

The form descriptorspixels, area, density, perimeter, roundness, elongation, compactness, rotation – describe what the blob looks like. pixels is the count of passing pixels; area is the area of the axis-aligned bounding box (w * h); density is the ratio of the two, which approaches 1.0 for a solid rectangle and drops toward 0.0 for a thin diagonal stroke. roundness and compactness both score how round the blob is, from different geometric viewpoints (roundness from the second-order moments, compactness from the perimeter-to-area ratio); elongation is 1.0 - roundness for convenience. rotation is the orientation of the major axis in radians, which is most accurate on elongated blobs and becomes noisy on nearly-round ones (an ambiguous axis has no well-defined direction).

The threshold and merge metadatacode, count – identify which threshold matched and how many source blobs were merged into the returned one. code is a 32-bit bitmap with one bit set per matching threshold (single threshold gives code == 1; a merged multi-colour blob can have several bits set); count is 1 unless merge=True combined several detections into one.

The corners group – corners, min_corners – give the rotated geometry of the blob. corners is the 4-tuple of (x, y) extremes pulled from the blob’s contour, sorted clockwise from the top-left; min_corners is the 4-tuple of corners for the minimum-area rotated rectangle that encloses the blob. The min-area rectangle is the tight fit; the axis-aligned rect is the loose fit aligned with the pixel grid. Both are useful depending on whether a downstream stage needs an oriented box or a plain one.

A blob detection illustrated against a binary threshold mask. The left panel shows a tilted oval mask of passing pixels. The right panel shows the same mask annotated with the axis-aligned bounding box drawn around it, the centroid marked with a cross in the middle, a dashed minimum-area rotated rectangle hugging the oval at its true angle, and the major-axis line through the centroid pointing along the oval's long direction.

A blob carries the axis-aligned bounding box (rect, x, y, w, h), the centroid (cx, cy or sub-pixel cxf, cyf), the minimum-area rotated rectangle (min_corners plus rotation), and the optional major / minor axis lines computed by the module-level helpers below.

7.25.4. Merging overlapping blobs

merge=True post-processes the result list to combine blobs whose bounding rectangles overlap. The natural use is detecting a target whose colour the camera sees as multiple thresholded regions because of specular highlights, shadow lines, or mismatched lighting across the object: a single red ball might come back as three or four small red blobs that, taken together, trace the ball. With merge=True, the three blobs become one large blob, the rect covers the union, the code is the bitwise OR of the merged blobs’ codes (so a multi-colour merge identifies which colours contributed), and count reports how many source blobs were combined.

margin grows or shrinks the bounding rectangles before the overlap test. With margin=2, blobs whose bounding rectangles come within 2 pixels of each other still merge; with margin=-2, only blobs whose bounding rectangles overlap by at least 2 pixels merge. The natural tuning: positive margin to handle blobs that the threshold broke into adjacent pieces; negative margin to keep tightly-grouped distinct objects separate.

merge_cb runs on each candidate pair before the merge happens. The callback receives the two blobs and returns True to allow the merge or False to prevent it. This is the right tool for cross-checking merges that the geometric rule misses – for instance, refusing to merge two blobs whose rotation angles disagree by more than a threshold, or refusing to merge a small blob into a much larger one if the small one is just speckle.

7.25.5. Projection histograms

x_hist_bins_max and y_hist_bins_max attach optional projection histograms to each blob. A projection histogram is the count of passing pixels along one axis: the X-axis histogram totals passing pixels per column inside the blob’s bounding box, and the Y-axis histogram totals per row. Both default to zero – the histograms are not computed unless a non-zero max is supplied, since they would otherwise add work to every detection.

When they are computed, the histograms provide a cheap 1-D signal that an application can run further analysis on: detecting the position of a vertical stripe inside the blob, finding the breakpoint of a two-coloured target, counting how many gaps appear along the long axis. They are populated as the x_hist_bins and y_hist_bins properties on each Blob.

7.25.6. Extra geometric helpers

A handful of further geometric measures live as module-level functions that take a blob and return the requested measurement:

  • image.get_solidity() returns the blob’s solidity – pixels divided by the area of the convex hull. A solid filled region is close to 1.0; a blob with concavities (a horseshoe, a hand with fingers spread) drops well below.

  • image.get_convexity() returns the convexity – the convex-hull perimeter divided by the blob’s perimeter. A perfectly convex blob is 1.0; jagged or notched blobs are lower.

  • image.get_major_axis_line() and image.get_minor_axis_line() return Line objects along the major and minor axes of the blob, derived from the rotated minimum-area rectangle.

  • image.get_enclosing_circle() returns a Circle that encloses the blob – useful when a downstream stage wants a circle to draw or test against.

  • image.get_enclosed_ellipse() returns the 5-tuple (cx, cy, rx, ry, rotation) for an ellipse inscribed in the blob’s minimum-area rectangle. The values feed directly into draw_ellipse().

7.25.7. Auto-learning a threshold

A blob detector is only as good as the thresholds it is run with, and the work of finding the right threshold for a target colour is its own problem. Two common patterns reduce that work.

The first is interactive selection in the IDE: capture a frame, drag a rectangle around an example of the target colour, and let the IDE’s threshold editor report the LAB bounds it sees. Those bounds drop into the script as the find_blobs() thresholds and the detector is ready.

The second is programmatic auto-learn: a calibration routine running on the camera captures a frame, takes a histogram of a known patch where the target is (get_histogram() with roi=), and reads the patch’s value range off the histogram with get_percentile(). The 5th percentile sets each channel’s low bound and the 95th its high bound, ignoring stray outlier pixels at both ends. On an RGB565 image one percentile call reports all three LAB channels at once, so the two calls produce the six numbers find_blobs() expects:

h = img.get_histogram(roi=patch)
lo = h.get_percentile(0.05)
hi = h.get_percentile(0.95)
threshold = (lo.l_value, hi.l_value,
             lo.a_value, hi.a_value,
             lo.b_value, hi.b_value)