7.16. Custom convolution kernels¶
The neighbourhood filters covered so far each
had a built-in statistic the filter applied to
the window at every position – the mean, the
Gaussian-weighted average, the median.
morph() is the one filter
that lets the application supply the statistic
itself, in the form of a kernel: a small
matrix of weights that describes how the filter
should combine the neighbourhood pixels into a
single output value.
The mechanism is the classical convolution
operation. At every output position, every
neighbourhood pixel is multiplied by the
matching weight in the kernel, the products are
summed, the result is optionally scaled and
offset, and the value is written to the output
pixel. Different kernels produce different
results from the same input. A kernel with all
equal positive weights reproduces the
mean() filter; a
bell-shaped one reproduces
gaussian(). Patterns
beyond those produce edge responses, embosses,
gradients, sharpening, motion blur, and a long
catalogue of other effects – everything
classical image processing has ever wanted to
do with a single linear pass.
7.16.1. The morph method¶
The signature looks like the other neighbourhood filters with one extra argument:
img.morph(size, kernel, mul=1.0, add=0.0)
size is the radius the same way as
everywhere else, so the kernel must be exactly
(2 * size + 1) rows by (2 * size + 1)
columns. The kernel itself is a flat Python
list of those many numbers, in row-major
order – the first (2 * size + 1) entries
are the top row, the next (2 * size + 1)
are the second row, and so on, down to the
bottom row. mul scales the sum-of-products
before it is written into the output pixel,
and add adds a constant. The default
mul=1.0 and add=0.0 leave the
convolution output unchanged.
One detail worth being explicit about: the
method automatically divides the
sum-of-products by the sum of the kernel
entries before writing the output. That
auto-division means an averaging kernel whose
entries sum to nine – a 3-by-3 box blur, for
example – comes out at one-ninth scale with
no extra effort, and a Gaussian-approximation
kernel that sums to sixteen comes out at
one-sixteenth scale, both without the
application having to compute the division
itself. The application sets mul only when
it wants a further scale on top of the
auto-normalisation – or, more commonly, when
the kernel sums to zero (an edge-response
kernel) and the auto-division would be a
division by nothing. The framework treats the
sum as one in that case, and mul becomes
the only knob for keeping the unscaled
sum-of-products in range.
The threshold=True / offset=N pair
from the adaptive-threshold section also
works on morph(), so the
same custom-kernel framework can produce a
binary threshold whose cutoff is computed by
a custom statistic.
7.16.2. The kernel layout¶
A 3-by-3 kernel (size=1) is a flat list of
nine numbers laid out left to right, top to
bottom. The convention reads naturally if the
list is broken across three Python lines:
sobel_x = [-1, 0, 1,
-2, 0, 2,
-1, 0, 1]
This is the Sobel-x gradient operator – the
first standard kernel any application is going
to want and a useful one to walk through end
to end. The pattern is straightforward:
negative weights on the left column, positive
weights on the right column, with the centre
column zero. The row weights -1, -2, -1 (or
1, 2, 1 on the right) are higher in the
middle than at the corners, which gives the
centre row more influence over the result than
the corner rows.
When the kernel sweeps across a vertical edge – a column of pixels that goes from dark on the left to bright on the right – the negative weights pick up the dark side and the positive weights pick up the bright side. The sum of products is a large positive number, which the filter writes as a bright output pixel. A horizontal patch of uniform brightness produces zero, because every positive weight is matched by a negative weight of the same magnitude on a pixel with the same value.
Running the kernel:
img.morph(1, sobel_x, mul=0.25)
The Sobel kernel sums to zero – every
negative weight on the left side is matched by
an equal positive weight on the right – so
the auto-division does not divide by anything,
and mul is the only scale on the
sum-of-products. mul=0.25 keeps the
response in range: the largest absolute sum
the Sobel-x can produce from a 3-by-3 patch
is roughly 4 * 255 = 1020 (eight bright
pixels weighted up to 2), and dividing
that down by four lands the extreme cases at
255, where the format clips them cleanly.
The matching Sobel-y kernel detects horizontal edges by rotating the same weight pattern 90 degrees:
sobel_y = [-1, -2, -1,
0, 0, 0,
1, 2, 1]
Applications that want to detect any edge, regardless of direction, typically run both Sobels and combine the responses.
7.16.3. Offsetting the output¶
add is the other half of the scaling
story. A zero-sum kernel’s response is
signed – positive on one side of an edge,
negative on the other – and the negative half
clips to zero when written into an unsigned
pixel. add=128 shifts the response to be
centred at mid-grey, so negative responses
survive as values below 128 and positive
ones land above it: an edge response or an
emboss becomes visible in both directions, at
the cost of half the range in each.
Which combination of mul and add a
kernel expects is part of the kernel’s design;
the standard kernel catalogue lists the right settings
for each common kernel.
7.16.4. Larger kernels¶
Everything on this page has been described
with 3-by-3 kernels (size=1), because that
is the size the standard catalogue uses and
because the row-major layout is easy to write
out by hand at that size. Nothing in the
mechanism restricts the kernel to 3-by-3,
though. size=2 runs a 5-by-5 kernel, with
twenty-five entries in the flat list;
size=3 runs a 7-by-7 with forty-nine; and
so on, up to whatever radius the application
is willing to pay for. The framework handles
either flat-list or nested-row layouts at any
odd size.
The reason to reach for a larger kernel is the same reason to reach for a larger neighbourhood on any of the built-in filters: more averaging, broader feature detection, less sensitivity to single-pixel noise. The cost grows as the square of the radius – a 5-by-5 does roughly 2.8 times the per-pixel work of a 3-by-3, a 7-by-7 about 5.4 times – and that multiplier comes straight out of the frame rate.
The practical pattern is to stay at size=1
for the standard catalogue and reach for
larger sizes only when the algorithm needs
the larger neighbourhood. Edge detectors
rarely benefit beyond 3-by-3; smoothing
filters sometimes do; the right size depends
on the scale of the features the application
is trying to emphasise or suppress.
7.16.5. When to reach for morph¶
For everyday smoothing,
mean(),
gaussian(), and
bilateral() are faster and
cleaner. For edge detection,
laplacian() and
find_edges() are
purpose-built. The case for reaching into
morph() directly is when
the application needs a specific convolution
that the built-in filters do not expose – a
directional Sobel, a custom edge template, a
kernel tuned to a particular texture the rest
of the pipeline is going to look for, or any
of the standard catalogue of useful kernels
that classical image processing has built up
over the decades. The full flexibility of
arbitrary kernels is available; the price is
that the application is responsible for
choosing the kernel values that produce the
result it wants.