7.28. QR codes and AprilTags¶

The detectors so far – blobs, lines, circles, rectangles – find geometric features: positions and outlines that a downstream stage interprets. The remaining detectors find symbolic features: printed patterns whose visual structure exists specifically to encode a payload. The camera locates them, the decoder reads the bits, and what comes back is not a position but a string (or an ID) the printer of the symbol chose deliberately.

Two such families dominate small-camera applications. QR codes carry arbitrary text, URLs, contact cards, or binary payloads – the consumer-facing 2D codes that appear on posters, packaging, and boarding passes. AprilTags carry a single numeric ID from a small fixed set, decode quickly even from a long distance, and (when the lens intrinsics are supplied) report a 6-DoF pose in the camera frame – the robotics-facing 2D codes that mark drones, calibration targets, and fiducials. Both detectors return result objects with the same bounding-box vocabulary the blob and rect detectors use, but the payload makes them genuinely different from anything covered so far.

7.28.1. QR codes¶

find_qrcodes() scans the frame for QR codes and returns a list of QRCode result objects:

codes = img.find_qrcodes()

for c in codes:
    img.draw_rectangle(c.rect, color=(0, 255, 0))
    for corner in c.corners:
        img.draw_circle((corner[0], corner[1], 4),
                        color=(0, 255, 0))
    print(c.payload)

The detector takes a single optional roi to restrict the search. It needs grayscale input – a colour frame is converted internally before decoding.

Each detection carries the bounding box (x, y, w, h, rect), the four detected corners (corners, the projective quadrilateral the QR code’s finder patterns trace out), and the decoded payload as a string. The corners are the right thing to draw when annotating the detection – a QR code viewed off-axis is not axis-aligned and the bounding box gives only a loose outline.

The decoder metadata covers everything the QR decoder learned along the way. version is the QR-code version, 1 – 40, which sets the module grid size (a version-1 code is 21 modules wide, a version-40 code is 177). ecc_level is the error- correction level (0 – 3 for L / M / Q / H); higher levels reserve more codewords for error correction and survive more damage at the cost of less payload room. mask is the mask pattern (0 – 7) the encoder picked to minimise decoder confusion. data_type is the encoding the decoder reported – numeric, alphanumeric, binary, or Kanji – and the is_numeric / is_alphanumeric / is_binary / is_kanji flags expose the same value as friendlier booleans.

eci is the Extended Channel Interpretation value, which identifies the text encoding the bytes are in (UTF-8, ISO-8859-1, and so on). A QR code from arbitrary printed material may not be guaranteed UTF-8; an application that needs to decode the bytes correctly checks eci and decodes accordingly. The Kanji case in particular: MicroPython does not parse Kanji encoding, so an is_kanji payload has to be treated as a byte array and decoded by the application.

A typical use: a camera reads QR codes off a conveyor and reports the decoded payload to a host. The cam runs find_qrcodes() once per frame, iterates the returned list, picks the codes whose data_type matches what the application expects, and forwards c.payload over UART or USB. The bounding-box and corner data are useful for the IDE preview but are not what the host cares about.

7.28.2. AprilTags¶

find_apriltags() scans the frame for AprilTags and returns a list of AprilTag result objects:

tags = img.find_apriltags(families=image.TAG36H11)

for t in tags:
    img.draw_rectangle(t.rect, color=(0, 255, 0))
    img.draw_cross(t.cx, t.cy, color=(0, 255, 0))
    print(t.id, t.decision_margin)

AprilTags differ from QR codes in their design goals. A QR code is built to encode arbitrary data in a single dense symbol the user reads once at close range. An AprilTag is built to encode a small ID in a sparse symbol the camera reads continuously from a distance, with as much error tolerance as the Hamming code of its family allows. The trade-off shows up in both directions: a QR code can carry hundreds of bytes but needs to be read up close; an AprilTag carries only a few hundred unique IDs but reads reliably from metres away.

The families keyword takes a bitmask of the tag families to decode. The available families are image.TAG16H5, image.TAG25H9, image.TAG36H10, image.TAG36H11, image.TAGCIRCLE21H7, image.TAGCIRCLE49H12, image.TAGCUSTOM48H12, image.TAGSTANDARD41H12, and image.TAGSTANDARD52H13. Each family trades off ID count against robustness. The H number in the name is the minimum Hamming distance between any two codes in the family – how many bits must flip before one valid code turns into another – TAG16H5 has 30 IDs at distance 5, TAG25H9 has 35 IDs at distance 9, and TAG36H11 (the default and the most common) has 587 IDs at distance 11. The detector corrects up to two bit errors no matter the family, so the distance decides how risky that correction is: a random pattern in a noisy frame only has to land within two bits of a valid code to decode as a false detection, and the higher-distance families spread their codes so much more sparsely that such collisions become rare – the reason TAG36H11 is the recommended choice. Detection time scales with the number of enabled families, so an application enables only what it actually prints. The bitmask is the bitwise OR of the family constants when multiple families are needed in one call.

Each detection carries the bounding-box vocabulary – x, y, w, h, rect, area, integer and sub-pixel centroids (cx, cy, cxf, cyf) – and the four detected corners (corners). The identification fields follow: id is the numeric ID within the family (0 – 586 for TAG36H11), family is the numeric family constant, and name is the family name as a string.

The match-quality fields are what an application uses to filter detections. decision_margin is a 0.0 – 1.0 confidence score; higher is better, and filtering out detections below decision_margin > 0.1 cleans up most spurious hits at no cost. hamming counts the bit errors the decoder accepted for this tag – lower is better, 0 meaning a perfect decode. goodness is a historical image-quality metric the current decoder no longer computes; it is always 0.0 and can be ignored.

7.28.3. Pose from intrinsics¶

The transformative feature of find_apriltags(), the one that justifies AprilTags as the robotics fiducial of choice, is that the method can recover the tag’s 6-DoF pose in the camera frame directly from the detected corners and a small set of calibration intrinsics. The intrinsics are the camera’s X and Y focal lengths in pixels (fx, fy) and the optical centre in pixels (cx, cy), all four of which the application measures once with a calibration procedure and hard-codes thereafter.

When the intrinsics are supplied, the returned AprilTag populates its x_translation, y_translation, z_translation fields with the tag’s position relative to the camera, and x_rotation, y_rotation, z_rotation (and the duplicate rotation for symmetry) with the tag’s orientation. Without intrinsics, all six fields are 0.0 and the application is responsible for any pose estimation it needs.

The translation fields are reported in tag widths: the decoder treats the tag as 1 unit wide, so the application multiplies each translation by the physical width of the printed tag to get metric distances. A tag printed at 100 mm across and reporting z_translation = 8.3 is 830 mm away from the camera; the same tag printed at 50 mm across at the same distance would report z_translation = 16.6. The rotation fields are in radians and need no scaling.

The pose estimate is the basis for a wide range of robotics applications: docking a robot to a charging station marked with a tag, following a printed waypoint trail, recovering the camera’s own pose from multiple known tags in the environment. A camera that knows the intrinsics, sees a tag, and has a real-world position for the tag has, by the same arithmetic, a real-world position for itself.

7.28.4. When to pick which¶

QR codes and AprilTags solve different problems. The choice between them comes down to what the printed symbol carries.

When the application needs to carry arbitrary data through the printed symbol – a URL, a serial number string, a contact record – the QR code is the right choice. Hundreds of bytes fit in a modestly-sized code, the encoding is public and supported on every smartphone, and the decoder copes with rotation, moderate damage, and oblique angles.

When the application needs a small ID read continuously from a distance with optional pose – a fiducial on a moving robot, a calibration target in a room, a docking marker on a charging station – the AprilTag is the right choice. Hundreds of IDs are plenty for the use case, the Hamming code recovers from bit errors that would defeat a QR code, and the pose estimate is free once the intrinsics are calibrated.

Some applications use both: an AprilTag marks a known location and an associated QR code (printed alongside) carries the metadata about what that location means. The two detectors run independently on the same frame and the application correlates their bounding boxes to match each tag to its companion code.