Shape and strides
=================

The data inside an :class:`~numpy.ndarray` is one
packed block of numbers. The descriptor in front of that
block decides how that flat block is read out as a
tensor.

What the descriptor records
---------------------------

Five values describe how to read the data block as a
tensor::

    a = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.uint8)

    a.ndim       # 2     - number of dimensions
    a.shape      # (2, 3)- length along each dimension
    a.itemsize   # 1     - bytes per element (from dtype)
    a.size       # 6     - total number of elements
    a.strides    # (3, 1)- step pattern through the buffer

The :func:`~numpy.ndinfo` helper prints all of them
plus the location of the underlying buffer in one call.
Two arrays whose buffer locations match are sharing
memory::

    np.ndinfo(a)
    # class: ndarray
    # shape: (2, 3)
    # strides: (3, 1)
    # itemsize: 1
    # data pointer: 0x...
    # type: uint8

Strides explained
-----------------

A *stride* is how many bytes to step in the data block
to move one element along a given axis. For the 2x3
``uint8`` array above, the strides are ``(3, 1)``:
moving down by one row jumps 3 bytes, moving right by
one column jumps 1 byte. That is the same as saying the
rows are stored back to back, left to right::

    memory: [ 1 ][ 2 ][ 3 ][ 4 ][ 5 ][ 6 ]
              ^ row 0          ^ row 1
              <------- 3 bytes ---->

To read ``a[i, j]``, :mod:`numpy` computes
``i * strides[0] + j * strides[1]`` from the start of
the data block and reads ``itemsize`` bytes from there.
The same formula extends to any number of dimensions.

This layout -- rows stored end to end, with the last
axis varying fastest along memory -- is called
*row-major* order. Every array :mod:`numpy` allocates
on the camera uses this layout.

Row-major has consequences
--------------------------

Two things fall out of "rows stored back to back" that
matter when shaping a buffer on the camera.

**The last axis is contiguous.** Walking ``a[0, 0]`` to
``a[0, 1]`` touches the next byte over. Walking
``a[0, 0]`` to ``a[1, 0]`` jumps across a whole row.

**The last axis is the fast axis for whole-array math.**
:mod:`numpy` on the camera always walks the last axis
innermost, regardless of which axis happens to be
longer. The desktop ``numpy`` library silently reorders
its loops to put the longest axis innermost; the camera
does not, so a layout choice that desktop ``numpy``
would have papered over still costs time here.
``np.sum(m, axis=1)`` collapses the last axis and runs
in the contiguous direction; ``np.sum(m, axis=0)`` does
not. When the application has a choice about how to
lay out a buffer, put the long axis last so operations
along it stay in the inner loop.

If the layout starts out wrong,
:meth:`~numpy.ndarray.transpose` (or the ``.T``
shortcut) fixes it without copying the data -- it just
swaps the strides::

    a = b.T            # now iterates fast

:doc:`../performance` has the full performance
discussion.

Reshape, transpose, slicing -- descriptor edits
-----------------------------------------------

Any operation that only rewrites the descriptor is free.
``reshape`` swaps a new ``shape`` and ``strides``
across the same data block. ``transpose`` reverses the
strides. ``a[::2]`` doubles a stride. Each returns a
*view* of the same underlying buffer.

Anything that has to walk the data and write a new
buffer is a copy. The rule for now is that descriptor
edits are free and data walks are not.

A note about ndim
-----------------

:mod:`numpy` on the camera is built with a maximum
supported ``ndim`` of 4. Operations that would produce
a higher-rank array raise :exc:`ValueError`. The vast
majority of camera-side work is 1-D or 2-D, so the
limit is rarely an issue.