7.28. QR codes and AprilTags¶
The detectors so far – blobs, lines, circles, rectangles – find geometric features: positions and outlines that a downstream stage interprets. The remaining detectors find symbolic features: printed patterns whose visual structure exists specifically to encode a payload. The camera locates them, the decoder reads the bits, and what comes back is not a position but a string (or an ID) the printer of the symbol chose deliberately.
Two such families dominate small-camera applications. QR codes carry arbitrary text, URLs, contact cards, or binary payloads – the consumer-facing 2D codes that appear on posters, packaging, and boarding passes. AprilTags carry a single numeric ID from a small fixed set, decode quickly even from a long distance, and (when the lens intrinsics are supplied) report a 6-DoF pose in the camera frame – the robotics-facing 2D codes that mark drones, calibration targets, and fiducials. Both detectors return result objects with the same bounding-box vocabulary the blob and rect detectors use, but the payload makes them genuinely different from anything covered so far.
7.28.1. QR codes¶
find_qrcodes() scans the
frame for QR codes and returns a list of
QRCode result
objects:
codes = img.find_qrcodes()
for c in codes:
img.draw_rectangle(c.rect, color=(0, 255, 0))
for corner in c.corners:
img.draw_circle((corner[0], corner[1], 4),
color=(0, 255, 0))
print(c.payload)
The detector takes a single optional roi
to restrict the search. It needs grayscale
input – a colour frame is converted
internally before decoding.
Each detection carries the bounding box
(x, y, w, h, rect), the
four detected corners (corners, the
projective quadrilateral the QR code’s
finder patterns trace out), and the decoded
payload as a string. The corners are the
right thing to draw when annotating the
detection – a QR code viewed off-axis is
not axis-aligned and the bounding box gives
only a loose outline.
The decoder metadata covers everything the
QR decoder learned along the way.
version is the QR-code version, 1 – 40,
which sets the module grid size (a version-1
code is 21 modules wide, a version-40 code
is 177). ecc_level is the error-
correction level (0 – 3 for L / M / Q / H);
higher levels reserve more codewords for
error correction and survive more damage at
the cost of less payload room. mask is
the mask pattern (0 – 7) the encoder picked
to minimise decoder confusion. data_type
is the encoding the decoder reported –
numeric, alphanumeric, binary, or Kanji –
and the is_numeric / is_alphanumeric
/ is_binary / is_kanji flags expose
the same value as friendlier booleans.
eci is the Extended Channel
Interpretation value, which identifies the
text encoding the bytes are in (UTF-8,
ISO-8859-1, and so on). A QR code from
arbitrary printed material may not be
guaranteed UTF-8; an application that needs
to decode the bytes correctly checks eci
and decodes accordingly. The Kanji case in
particular: MicroPython does not parse
Kanji encoding, so an is_kanji payload
has to be treated as a byte array and
decoded by the application.
A typical use: a camera reads QR codes off a
conveyor and reports the decoded payload to
a host. The cam runs
find_qrcodes() once per
frame, iterates the returned list, picks the
codes whose data_type matches what the
application expects, and forwards
c.payload over UART or USB. The
bounding-box and corner data are useful for
the IDE preview but are not what the host
cares about.
7.28.2. AprilTags¶
find_apriltags() scans
the frame for AprilTags and returns a list
of AprilTag
result objects:
tags = img.find_apriltags(families=image.TAG36H11)
for t in tags:
img.draw_rectangle(t.rect, color=(0, 255, 0))
img.draw_cross(t.cx, t.cy, color=(0, 255, 0))
print(t.id, t.decision_margin)
AprilTags differ from QR codes in their design goals. A QR code is built to encode arbitrary data in a single dense symbol the user reads once at close range. An AprilTag is built to encode a small ID in a sparse symbol the camera reads continuously from a distance, with as much error tolerance as the Hamming code of its family allows. The trade-off shows up in both directions: a QR code can carry hundreds of bytes but needs to be read up close; an AprilTag carries only a few hundred unique IDs but reads reliably from metres away.
The families keyword takes a bitmask of
the tag families to decode. The available
families are image.TAG16H5,
image.TAG25H9,
image.TAG36H10,
image.TAG36H11,
image.TAGCIRCLE21H7,
image.TAGCIRCLE49H12,
image.TAGCUSTOM48H12,
image.TAGSTANDARD41H12, and
image.TAGSTANDARD52H13. Each family
trades off ID count against robustness. The
H number in the name is the minimum
Hamming distance between any two codes in
the family – how many bits must flip before
one valid code turns into another –
TAG16H5 has 30 IDs at distance 5,
TAG25H9 has 35 IDs at distance 9, and
TAG36H11 (the default and the most
common) has 587 IDs at distance 11. The
detector corrects up to two bit errors no
matter the family, so the distance decides
how risky that correction is: a random
pattern in a noisy frame only has to land
within two bits of a valid code to decode as
a false detection, and the higher-distance
families spread their codes so much more
sparsely that such collisions become rare –
the reason TAG36H11 is the recommended
choice.
Detection time scales with the number of
enabled families, so an application enables
only what it actually prints. The bitmask
is the bitwise OR of the family constants
when multiple families are needed in one
call.
Each detection carries the bounding-box
vocabulary – x, y, w, h,
rect, area, integer and sub-pixel
centroids (cx, cy, cxf, cyf)
– and the four detected corners
(corners). The identification fields
follow: id is the numeric ID within the
family (0 – 586 for TAG36H11),
family is the numeric family constant,
and name is the family name as a
string.
The match-quality fields are what an
application uses to filter detections.
decision_margin is a 0.0 – 1.0
confidence score; higher is better, and
filtering out detections below
decision_margin > 0.1 cleans up most
spurious hits at no cost. hamming
counts the bit errors the decoder accepted
for this tag – lower is better, 0
meaning a perfect decode. goodness is a
historical image-quality metric the current
decoder no longer computes; it is always
0.0 and can be ignored.
7.28.3. Pose from intrinsics¶
The transformative feature of
find_apriltags(), the one
that justifies AprilTags as the robotics
fiducial of choice, is that the method can
recover the tag’s 6-DoF pose in the camera
frame directly from the detected corners
and a small set of calibration intrinsics.
The intrinsics are the camera’s X and Y
focal lengths in pixels (fx, fy) and
the optical centre in pixels (cx,
cy), all four of which the application
measures once with a calibration procedure
and hard-codes thereafter.
When the intrinsics are supplied, the
returned AprilTag
populates its x_translation,
y_translation, z_translation fields
with the tag’s position relative to the
camera, and x_rotation, y_rotation,
z_rotation (and the duplicate
rotation for symmetry) with the tag’s
orientation. Without intrinsics, all six
fields are 0.0 and the application is
responsible for any pose estimation it
needs.
The translation fields are reported in
tag widths: the decoder treats the tag as
1 unit wide, so the application multiplies
each translation by the physical width of
the printed tag to get metric distances. A
tag printed at 100 mm across and reporting
z_translation = 8.3 is 830 mm away from
the camera; the same tag printed at 50 mm
across at the same distance would report
z_translation = 16.6. The rotation
fields are in radians and need no scaling.
The pose estimate is the basis for a wide range of robotics applications: docking a robot to a charging station marked with a tag, following a printed waypoint trail, recovering the camera’s own pose from multiple known tags in the environment. A camera that knows the intrinsics, sees a tag, and has a real-world position for the tag has, by the same arithmetic, a real-world position for itself.
7.28.4. When to pick which¶
QR codes and AprilTags solve different problems. The choice between them comes down to what the printed symbol carries.
When the application needs to carry arbitrary data through the printed symbol – a URL, a serial number string, a contact record – the QR code is the right choice. Hundreds of bytes fit in a modestly-sized code, the encoding is public and supported on every smartphone, and the decoder copes with rotation, moderate damage, and oblique angles.
When the application needs a small ID read continuously from a distance with optional pose – a fiducial on a moving robot, a calibration target in a room, a docking marker on a charging station – the AprilTag is the right choice. Hundreds of IDs are plenty for the use case, the Hamming code recovers from bit errors that would defeat a QR code, and the pose estimate is free once the intrinsics are calibrated.
Some applications use both: an AprilTag marks a known location and an associated QR code (printed alongside) carries the metadata about what that location means. The two detectors run independently on the same frame and the application correlates their bounding boxes to match each tag to its companion code.