The ndarray =========== The :class:`~numpy.ndarray` is the type that holds numerical data in :mod:`numpy`. It is two things in one: a single packed block of data and a small descriptor in front of that block that says how to read it. Inside the box -------------- The data block holds every element of the array end to end, with nothing extra between them. Each element occupies the same number of bytes -- one for an array of ``uint8`` values, two for ``uint16``, four for ``float``. A 256-element ``uint8`` array is exactly 256 bytes of data; the same 256 numbers in a Python :class:`list` take a kilobyte -- one 32-bit slot per element regardless of how few bits the value actually needs. The descriptor records what the block *means*. Five values are enough to describe any rectangular array, no matter how many dimensions: * :attr:`~numpy.ndarray.dtype` -- the element type. Decides how many bytes each element takes and what range of values it can hold (covered on :doc:`dtypes`). * :attr:`~numpy.ndarray.itemsize` -- the byte width of one element, derived from the dtype. * :attr:`~numpy.ndarray.ndim` -- the number of dimensions (1 for a vector, 2 for a matrix, 3 for a volume, up to 4). * :attr:`~numpy.ndarray.shape` -- the size along each dimension as a tuple. * :attr:`~numpy.ndarray.strides` -- how to step through the data block to walk each axis. Covered on :doc:`../shape/shape-and-strides`. That is it. Every fast path in :mod:`numpy` -- arithmetic, reductions, broadcasting, slicing -- works directly from those five values plus the data pointer, with no per-element Python overhead. What the design buys -------------------- Three properties fall out of "packed block + small descriptor" and define how the rest of the section behaves. **Element-wise math runs as a single call.** ``a + b`` between two arrays of matching shape adds the two buffers and writes a third, all inside one library call. ``np.sin(a)`` does the same for the sine of every element. The arithmetic, comparison, and bit-wise operators all work this way. **Looking at the same data a different way is free.** Asking for a sub-region of an array, or for the same data laid out under a different shape, does not move any bytes. The operation returns a new descriptor pointing at the *same* data block. The result is called a *view* -- a second window onto the same underlying buffer. Writing through a view writes to the source. **Mixed shapes still work.** A shorter array against a longer one, a row against a matrix, a column against a row -- :mod:`numpy` lines them up by *broadcasting*, a small set of rules that decide which short axis stretches to match the long one. The stretch is virtual; no data is duplicated. What the design costs --------------------- Two restrictions follow from the same design. **Every element has the same type.** A list can hold an ``int`` next to a ``str`` next to a list of three more ``int`` values; an :class:`~numpy.ndarray` cannot. The dtype is fixed at construction time. The :doc:`dtypes` page covers the small set of types :mod:`numpy` supports and the rules that come out of fixing one. **Growing an array is not free.** A list keeps spare slots at the end and supports ``.append`` cheaply. An :class:`~numpy.ndarray` is exactly the size it needs to be; appending would mean allocating a new, larger buffer and copying the old contents into it. There is no :meth:`append` method, on purpose. The right pattern on the camera is to pre-allocate the destination at its final size and *fill* it; :doc:`../performance` covers the technique. With a packed typed buffer for the data, a small descriptor for the metadata, and three behavioural guarantees (fast element-wise math, no-copy alternate views of the same data, and shapes that broadcast), the :class:`~numpy.ndarray` is the foundation the rest of the chapter rests on. How an array actually comes into existence -- from a literal, from a pre-filled allocation, from a peripheral buffer -- is the next practical question.