9.2. The ndarray

The ndarray is the type that holds numerical data in numpy. It is two things in one: a single packed block of data and a small descriptor in front of that block that says how to read it.

9.2.1. Inside the box

The data block holds every element of the array end to end, with nothing extra between them. Each element occupies the same number of bytes – one for an array of uint8 values, two for uint16, four (or eight) for float. A 256-element uint8 array is exactly 256 bytes of data; the same 256 numbers held in a Python list would be over two kilobytes of separate integer objects with pointers between them.

The descriptor records what the block means. Five values are enough to describe any rectangular array, no matter how many dimensions:

  • dtype – the element type. Decides how many bytes each element takes and what range of values it can hold (covered on Dtypes).

  • itemsize – the byte width of one element, derived from the dtype.

  • ndim – the number of dimensions (1 for a vector, 2 for a matrix, 3 for a volume).

  • shape – the size along each dimension as a tuple.

  • strides – how to step through the data block to walk each axis. Covered on Shape and strides.

That is it. Every fast path in numpy – arithmetic, reductions, broadcasting, slicing – works directly from those five values plus the data pointer, with no per-element Python overhead.

9.2.2. What you get from the design

Three properties fall out of “packed block + small descriptor” and define how the rest of the section behaves.

Element-wise math runs as a single call. a + b between two arrays of matching shape adds the two buffers and writes a third, all inside one library call. np.sin(a) does the same for the sine of every element. The arithmetic, comparison, and bit-wise operators all work this way; the universal functions (sin, exp, log, …) do too. See Operators and Universal functions.

Reshape, transpose, and slicing are free. None of the three actually moves the data. Each one returns a new descriptor pointing at the same data block. The result is called a view: a second window onto the same underlying buffer. Writing through a view writes to the source. See Views and copies.

Mixed shapes still work. A shorter array against a longer one, a row against a matrix, a column against a row – numpy lines them up by broadcasting, a small set of rules that decide which short axis stretches to match the long one. The stretch is virtual; no data is duplicated. See Broadcasting.

9.2.3. What you give up in exchange

Two restrictions follow from the same design.

Every element has the same type. A list can hold an int next to a str next to a list of three more ints; an ndarray cannot. The dtype is fixed at construction time. The Dtypes page covers the small set of types numpy supports and the rules that come out of fixing one.

Growing an array is not free. A list keeps spare slots at the end and lets you .append cheaply. An ndarray is exactly the size it needs to be; appending would mean allocating a new, larger buffer and copying the old contents into it. There is no append() method, on purpose. The right shape on the camera is to pre-allocate the destination at its final size and fill it; Performance covers the pattern.

9.2.4. Where to look next

Making arrays is the constructor catalogue – every way to actually get an ndarray from a literal, a sensor buffer, or a generated sequence. Dtypes covers the element-type choice that decides how much RAM each element costs. After those two, Shape and strides opens the descriptor back up to explain how multi-dimensional indexing falls out of it.