Memory pools
============

A camera that holds three full-resolution frames in a
framebuffer pool, runs a separate preview buffer alongside,
and still has room for a Python script and its objects is
juggling more memory than a single block of RAM on the MCU
could provide. MicroPython fits everything in by spreading
it across the **several distinct kinds of memory** the MCU
provides, and by routing each kind of allocation to the
kind of memory it actually needs.

Kinds of memory
---------------

A modern OpenMV Cam MCU exposes four distinct kinds of
memory. The first is invisible to the application; the other
three are pools that allocations can come from.

* **The CPU's data cache** -- a small, very fast region of
  memory that sits between the CPU and the rest of RAM.
  When the CPU reads or writes a value from main memory the
  cache automatically keeps a copy, so repeated accesses to
  the same data stay in the cache and never pay the cost of
  going out to slower memory. The cache is *not* a pool
  allocations come from. It is transparent to the
  application -- it just makes the rest of RAM feel faster
  in practice than its raw latency would suggest, up to the
  point where a working set stops fitting in it.
* **Tightly-coupled processor memory** -- a small block of
  RAM wired directly to the CPU with no bus in between.
  Single-cycle access, never misses, never waits.
  Allocations that genuinely need the fastest possible
  memory -- where every cycle of latency matters -- come
  out of this pool.
* **Fast on-chip memory** -- a few hundred kilobytes up to
  about a megabyte of RAM, built into the MCU package. Low
  latency, high bandwidth, but limited in size. The
  MicroPython heap lives here so Python object accesses stay
  quick; smaller working buffers that the CPU touches a lot
  share the pool.
* **Slower bulk memory** -- on boards that pair the MCU with
  an external memory die, tens of megabytes of off-chip
  RAM reached over the external bus. Much larger, but each
  access takes longer than on-chip memory; the data cache
  hides much of that cost for working sets it can hold, and
  the gap shows up on operations that sweep across data
  too big to cache. Used for allocations that have to be
  large and that the CPU can tolerate at slower speed --
  most importantly, the framebuffer pool.

Boards in the family fall on a spectrum: some have only
on-chip RAM; some pair on-chip RAM with a much larger
external block. Each of the three allocatable kinds is
treated as a **memory pool** -- a chunk that allocations
come out of -- and labelled so each request can ask for the
kind of memory it actually needs.

The primary framebuffer
-----------------------

The framebuffer that backs :meth:`~csi.CSI.snapshot` does
not ask for fast memory. It asks for *enough* memory --
nothing more. That puts it in whichever pool is largest, so
on a board with both on-chip and external memory the
framebuffer lands in the external block.

A full-resolution, triple-buffered framebuffer is far too
big to fit in the fast on-chip pool on most parts; the
larger pool is the only one that can hold it at all. The
CPU's data cache hides much of the per-access cost when the
application processes the image, and the DMA engine that
fills the framebuffer from the sensor keeps up with the
sensor's data rate either way.

The exact size the framebuffer takes is picked from the
current :meth:`~csi.CSI.pixformat`,
:meth:`~csi.CSI.framesize`, and
:meth:`~csi.CSI.framebuffers` count; it grows or shrinks
each time any of those changes.

Secondary sensor framebuffers
-----------------------------

A second :class:`~csi.CSI` instance gets its own
framebuffer, allocated from the same pool the primary uses.
The pool is shared; the buffers are independent. The
secondary's footprint is normally much smaller than the
primary's, because secondary sensors run at lower
resolutions, so the extra memory the second framebuffer
takes is a small fraction of the primary's.

The stream framebuffer
----------------------

The :doc:`image preview <../csi/ide-preview>` buffer is the
exception. It is not allocated from any of the pools at
runtime; it is a *fixed region* reserved at build time,
with a known address and a known size. That keeps the
preview path out of the way of every other allocation --
the region exists from boot and never moves.

The MicroPython heap
--------------------

Python objects -- variables, lists, dictionaries, class
instances, the :class:`~image.Image` wrapper an
:meth:`~csi.CSI.snapshot` call returns, every string and
tuple the application creates -- live on the **MicroPython
garbage-collected heap**, which is *separate* from the
camera's memory pools. The garbage-collected (GC) heap is
a region of memory MicroPython manages itself: Python code
allocates from it implicitly every time an object is
created, and MicroPython periodically scans the heap and
reclaims the space taken by objects the application is no
longer referencing, so the application never has to free
anything by hand.

A dedicated region is set aside for the GC heap at boot,
typically placed in fast on-chip memory so Python access
stays quick, with an optional overflow into the larger
external block on boards that need more headroom for big
data structures.

The :class:`~image.Image` returned by
:meth:`~csi.CSI.snapshot` is a small wrapper object on the
GC heap; the underlying pixel data lives in the framebuffer
in one of the camera's pools. The two never compete for the
same memory.

Putting it together
-------------------

Steering each kind of allocation to the right pool -- big
buffers to the larger pool where they fit, latency-sensitive
data to the faster pools, the Python heap to its own region,
the preview to its reserved slot -- is what makes it
possible to run a full-resolution capture pipeline, a
preview channel, and a non-trivial Python script alongside
each other on parts that have only a few megabytes of fast
memory total.