9.2. The ndarray¶
The ndarray is the type that holds
numerical data in numpy. It is two things in one:
a single packed block of data and a small descriptor in
front of that block that says how to read it.
9.2.1. Inside the box¶
The data block holds every element of the array end to
end, with nothing extra between them. Each element
occupies the same number of bytes – one for an array of
uint8 values, two for uint16, four (or eight) for
float. A 256-element uint8 array is exactly
256 bytes of data; the same 256 numbers held in a Python
list would be over two kilobytes of separate
integer objects with pointers between them.
The descriptor records what the block means. Five values are enough to describe any rectangular array, no matter how many dimensions:
dtype– the element type. Decides how many bytes each element takes and what range of values it can hold (covered on Dtypes).itemsize– the byte width of one element, derived from the dtype.ndim– the number of dimensions (1 for a vector, 2 for a matrix, 3 for a volume).shape– the size along each dimension as a tuple.strides– how to step through the data block to walk each axis. Covered on Shape and strides.
That is it. Every fast path in numpy – arithmetic,
reductions, broadcasting, slicing – works directly from
those five values plus the data pointer, with no
per-element Python overhead.
9.2.2. What you get from the design¶
Three properties fall out of “packed block + small descriptor” and define how the rest of the section behaves.
Element-wise math runs as a single call. a + b
between two arrays of matching shape adds the two
buffers and writes a third, all inside one library
call. np.sin(a) does the same for the sine of every
element. The arithmetic, comparison, and bit-wise
operators all work this way; the universal functions
(sin, exp, log, …) do too. See
Operators and
Universal functions.
Reshape, transpose, and slicing are free. None of the three actually moves the data. Each one returns a new descriptor pointing at the same data block. The result is called a view: a second window onto the same underlying buffer. Writing through a view writes to the source. See Views and copies.
Mixed shapes still work. A shorter array against a
longer one, a row against a matrix, a column against a
row – numpy lines them up by broadcasting, a
small set of rules that decide which short axis stretches
to match the long one. The stretch is virtual; no data
is duplicated. See Broadcasting.
9.2.3. What you give up in exchange¶
Two restrictions follow from the same design.
Every element has the same type. A list can hold an
int next to a str next to a list of three more
ints; an ndarray cannot. The
dtype is fixed at construction time. The Dtypes
page covers the small set of types numpy supports
and the rules that come out of fixing one.
Growing an array is not free. A list keeps spare
slots at the end and lets you .append cheaply. An
ndarray is exactly the size it
needs to be; appending would mean allocating a new,
larger buffer and copying the old contents into it.
There is no append() method, on purpose. The right
shape on the camera is to pre-allocate the destination
at its final size and fill it; Performance
covers the pattern.
9.2.4. Where to look next¶
Making arrays is the constructor catalogue –
every way to actually get an
ndarray from a literal, a sensor
buffer, or a generated sequence. Dtypes covers
the element-type choice that decides how much RAM each
element costs. After those two,
Shape and strides opens the descriptor
back up to explain how multi-dimensional indexing falls
out of it.