7.31. Displacement matching¶
Template matching answers where is this patch inside the frame; similarity scoring answers how alike are these two images overall. A different question sits between them: the two frames show the same scene, but the camera (or the scene) moved between them – by how much? That is the displacement problem, and the image module solves it with a single phase-correlation method.
7.31.1. Phase-correlation displacement¶
find_displacement()
estimates the rigid alignment between two
same-sized images using phase
correlation – a frequency-domain method
that runs a fast Fourier transform (FFT) on
each image, cross-correlates their phases,
and locates the peak in the result. The peak
position is the translation that aligns
the two images:
d = img.find_displacement(template)
print("shift:", d.x_translation, d.y_translation,
" response:", d.response)
The returned
Displacement
carries x_translation and
y_translation – the pixel shift in
each axis – plus response, a
confidence score from 0.0 to 1.0
where 1.0 is a perfect peak. Filtering out
detections below response > 0.3
discards spurious results in which the
phase correlation never found a clean
peak.
Both rotation and scale are 0.0
and 1.0 respectively in the default mode;
they take real values only when
logpolar=True (see below).
The method carries two practical
constraints. The first is power-of-two
dimensions: the FFT at the heart of
phase correlation is fastest – and on
the camera, only fully supported – at
sizes like 32-by-32, 64-by-64, and
128-by-128. The cleanest setup is to
capture at one of those sizes directly,
by passing the resolution to
framesize() as a tuple:
csi0.framesize((64, 64))
An application that needs displacement from a larger frame instead crops a power-of-two patch out of the region it cares about and runs the matcher on that.
The second is same-size inputs:
roi and template_roi must select
identical widths and heights, or the
matcher refuses the call. Two captures
from the same camera at the same
configuration satisfy this
automatically; a captured frame compared
against a loaded reference needs both
cropped to matching power-of-two patches
first.
7.31.2. Rotation and scale via log-polar¶
The default mode finds translation only. When the two frames also differ in rotation about a chosen centre or in scale about the same centre, running the phase correlation on the log-polar re-projection of each image turns those parameters into translation in the log-polar coordinate system – which the same phase-correlation matcher can recover:
d = img.find_displacement(template, logpolar=True)
print("rotation rad:", d.rotation,
" scale:", d.scale,
" response:", d.response)
With logpolar=True, the method runs
the same matching pipeline against the
log-polar-projected images instead of
the originals. The rotation and
scale fields of the result come back
filled in: rotation is the angle in
radians between the two frames,
scale is the scale factor between
them. x_translation and
y_translation become meaningless in
this mode (the translation along the
log-polar axes does not correspond to a
linear translation in the source).
The fix_rotation_scale=True keyword
covers the in-between case: the two
images differ in both translation and
rotation/scale, and the application needs
translation only after correcting for
the rotation and scale. The matcher
runs the log-polar pass first to recover
the rotation and scale, applies the
inverse to one of the images, then runs
the translation pass to recover the
remaining shift. The flag is meaningful
only when logpolar=False – it asks
the translation-mode matcher to first
strip the rotation/scale.
The pattern from Polar transforms –
Cartesian → polar → match – is what
find_displacement()
with logpolar=True does in one call.
The application stores a reference
log-polar patch at startup, captures and
log-polar-transforms each live frame,
and the method recovers the
rotation-and-scale difference between
them. For applications that need a
rotation- and scale-invariant tracker –
a docking robot whose camera tilts and
zooms as it approaches a target, a
stabilised gimbal that needs to know
how the image is rotating relative to a
reference – this is the standard
construction.
7.31.3. The classical use¶
The most common use of
find_displacement() is
frame-to-frame motion estimation in a
pipeline that processes a moving camera.
The cam captures a small power-of-2
patch at frame N, captures the same-sized
patch at frame N+1, runs
find_displacement()
on the two, and reads off the pixel
shift between them. The shift is the
estimated motion of the camera (or of
the scene, depending on whose frame of
reference matters) between the two
captures, useful for:
Optical-flow-style sensing – a hover drone with a downward-pointing camera uses the per-frame displacement to estimate its lateral motion and feed it back into the flight controller.
Image stabilisation – the displacement between consecutive frames is subtracted out of the captured image before it is recorded or transmitted, producing a smoother video stream.
Inspection alignment – a scanning cam moving along a conveyor uses the per-frame displacement to register each frame against the next and build a stitched view of the whole part.
Each of those applications takes the same form: capture, displace, accumulate into a running estimate, capture again.