Reading and writing pixels
==========================

Most operations on an image hide their per-pixel
work inside a single method call, where the loops
that touch every pixel happen at native speed.
There are cases, though, where application code
wants to touch one specific pixel directly: to
read what is at a particular position, to write a
new value into one, to sample a single point for a
calibration step, or to debug a value at a known
location. The image module exposes that level of
access through two addressing forms, each fitting
a different way of thinking about where a pixel
lives.

Addressing by coordinate
------------------------

The most natural form is the one Coordinates
already developed the vocabulary for: name a pixel
by its Cartesian ``(x, y)``.
:meth:`~image.Image.get_pixel` takes ``(x, y)`` and
returns the value at that position;
:meth:`~image.Image.set_pixel` takes the same
``(x, y)`` along with a value and writes it.

What those calls return or accept depends on the
image's format. Grayscale, binary, and Bayer
images carry a single value per pixel -- a
brightness for grayscale, a ``0`` or ``1`` for
binary, a single colour-channel sample for Bayer
-- so :meth:`~image.Image.get_pixel` returns a
single integer. RGB565 carries three colour
channels packed into 16 bits, and ``get_pixel``
unpacks them into an ``(r, g, b)`` tuple by
default, with each channel mapped into the
``0`` -- ``255`` range.

The default behaviour can be flipped on either
end. Passing ``rgbtuple=False`` to ``get_pixel``
on an RGB565 image falls back to the raw 16-bit
packed word -- the same form the linear index
returns, and the efficient form when the
application is going to write the same packed
value straight back. Passing ``rgbtuple=True`` on
a single-channel image does the opposite: the
stored value is converted into an RGB888 tuple
before returning, with Bayer images going through
an on-the-spot debayer step. The argument exists
so that calling code can ask for pixels in a
uniform colour space regardless of how the
underlying image stores them.

Compressed images -- JPEG and PNG -- are not
supported by ``get_pixel`` or ``set_pixel``.
Their bytes do not represent pixels at known
positions, and the methods raise an error rather
than return a value that would not mean anything.

In practice the patterns look like:

::

    v = img.get_pixel(40, 30)            # grayscale: int 0..255
    img.set_pixel(40, 30, 255)           # write white

    r, g, b = img.get_pixel(40, 30)      # RGB565: defaults to (r, g, b) tuple
    img.set_pixel(40, 30, (255, 0, 0))   # write red

If the requested ``(x, y)`` is outside the image,
``get_pixel`` returns :data:`None` and
``set_pixel`` does nothing. That is forgiving by
design: many algorithms walk close to the edges of
an image and briefly index out-of-range positions,
and a quiet no-op is less disruptive than an
exception every time it happens.

Addressing by linear index
--------------------------

The other form is to address pixels by their
position in the underlying buffer. Recall the
buffer's layout: pixels are stored row by row,
all of the top row's pixels first, then all of the
next row's, and so on down to the bottom. That
arrangement means every pixel has a single integer
index counting from ``0`` at the top-left and
incrementing along each row in turn. The pixel at
coordinate ``(x, y)`` has linear index
``y * width + x``.

.. figure:: ../figures/pixel-indexing.svg
   :alt: A 4-by-3 grid of cells. Each cell carries
         a large linear index from 0 in the
         top-left through 11 in the bottom-right,
         plus a small (x, y) tuple underneath.
         Columns are labelled x equals 0, 1, 2, 3
         across the top; rows are labelled y equals
         0, 1, 2 along the left edge. A caption
         underneath gives the relation: linear
         index equals y times width plus x.

   Pixels are addressed both by Cartesian
   ``(x, y)`` and by a linear index that walks the
   buffer row by row, left to right.

The image module exposes that index through
ordinary Python subscript notation: ``img[i]``
reads the pixel at linear index ``i``,
``img[i] = value`` writes one. What the index form
returns is the *raw stored value* for the format,
not the unpacked tuple :meth:`~image.Image.get_pixel`
returns by default. That distinction matters
because the format chosen earlier decides what the
raw value looks like:

* Grayscale and Bayer pixels come back as 8-bit
  integers.
* RGB565 and YUV422 pixels come back as 16-bit
  integers -- the packed word.
* Binary pixels come back as ``0`` or ``1``.
* JPEG and PNG pixels come back as 8-bit integers,
  one byte at a time of the compressed stream.
  Those values are opaque -- they are pieces of a
  compressed encoding rather than pixels in any
  ordinary sense.

The index form fits code that is already thinking
in terms of buffer offsets: a loop that walks
every pixel once, an algorithm that needs to jump
by a row at a time, or a piece of code translating
between buffer layouts. Code that is thinking in
terms of x and y coordinates is better served by
``get_pixel`` and ``set_pixel``; the two forms
address the same pixels through different mental
models.

The :class:`Image` is also iterable. ``for v in
img:`` walks the buffer in the same row-major
order, yielding the raw values one pixel at a
time, and ``len(img)`` is the pixel count for
uncompressed formats or the byte count for
compressed streams.

Why per-pixel Python is the slow path
-------------------------------------

A practical note worth being honest about.
Walking an image one pixel at a time from Python
is *slow*. A 320 × 240 grayscale image holds
76,800 pixels; calling
:meth:`~image.Image.get_pixel` on each of them in
a ``for`` loop runs millions of MicroPython
bytecode instructions to do work that an
equivalent native method could finish in a few
hundred microseconds. That is not a small factor.
It is the difference between a script that
processes frames in real time and one that crawls
along well below the camera's frame rate.

Almost every method on the :class:`Image` surface
exists because there is a faster, native version of
a common per-pixel pattern. A loop that adds two
images together becomes a single native call. A loop
that smooths each pixel by averaging it with its
neighbours becomes another. A loop that
classifies each pixel against a threshold becomes
a third. The application's job, most of the time,
is to recognise which whole-image method matches
the work the loop would have done, and reach for
that instead of writing the loop by hand.

Pixel-level read and write are still the right
tool when nothing else fits -- patching a
specific measurement back into the buffer,
sampling one position for a calibration step,
debugging a value at a known location. The point
is that they are the slow path, used when the
whole-image methods do not have the form the
application needs, not as the default way to
operate on pixels.