6.5. Shape and strides¶

The data inside an ndarray is one packed block of numbers. The descriptor in front of that block decides how that flat block is read out as a tensor.

6.5.1. What the descriptor records¶

Five values describe how to read the data block as a tensor:

a = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.uint8)

a.ndim       # 2     - number of dimensions
a.shape      # (2, 3)- length along each dimension
a.itemsize   # 1     - bytes per element (from dtype)
a.size       # 6     - total number of elements
a.strides    # (3, 1)- step pattern through the buffer

The ndinfo() helper prints all of them plus the location of the underlying buffer in one call. Two arrays whose buffer locations match are sharing memory:

np.ndinfo(a)
# class: ndarray
# shape: (2, 3)
# strides: (3, 1)
# itemsize: 1
# data pointer: 0x...
# type: uint8

6.5.2. Strides explained¶

A stride is how many bytes to step in the data block to move one element along a given axis. For the 2x3 uint8 array above, the strides are (3, 1): moving down by one row jumps 3 bytes, moving right by one column jumps 1 byte. That is the same as saying the rows are stored back to back, left to right:

memory: [ 1 ][ 2 ][ 3 ][ 4 ][ 5 ][ 6 ]
          ^ row 0          ^ row 1
          <------- 3 bytes ---->

To read a[i, j], numpy computes i * strides[0] + j * strides[1] from the start of the data block and reads itemsize bytes from there. The same formula extends to any number of dimensions.

This layout – rows stored end to end, with the last axis varying fastest along memory – is called row-major order. Every array numpy allocates on the camera uses this layout.

6.5.3. Row-major has consequences¶

Two things fall out of “rows stored back to back” that matter when shaping a buffer on the camera.

The last axis is contiguous. Walking a[0, 0] to a[0, 1] touches the next byte over. Walking a[0, 0] to a[1, 0] jumps across a whole row.

The last axis is the fast axis for whole-array math. numpy on the camera always walks the last axis innermost, regardless of which axis happens to be longer. The desktop numpy library silently reorders its loops to put the longest axis innermost; the camera does not, so a layout choice that desktop numpy would have papered over still costs time here. np.sum(m, axis=1) collapses the last axis and runs in the contiguous direction; np.sum(m, axis=0) does not. When the application has a choice about how to lay out a buffer, put the long axis last so operations along it stay in the inner loop.

If the layout starts out wrong, transpose() (or the .T shortcut) fixes it without copying the data – it just swaps the strides:

a = b.T            # now iterates fast

Performance has the full performance discussion.

6.5.4. Reshape, transpose, slicing – descriptor edits¶

Any operation that only rewrites the descriptor is free. reshape swaps a new shape and strides across the same data block. transpose reverses the strides. a[::2] doubles a stride. Each returns a view of the same underlying buffer.

Anything that has to walk the data and write a new buffer is a copy. The rule for now is that descriptor edits are free and data walks are not.

6.5.5. A note about ndim¶

numpy on the camera is built with a maximum supported ndim of 4. Operations that would produce a higher-rank array raise ValueError. The vast majority of camera-side work is 1-D or 2-D, so the limit is rarely an issue.