9.11. Reductions

A reduction collapses an array along one or more axes by summing, averaging, taking a min, and so on. Each reduction is a single library call against the whole array, much faster than the equivalent Python loop. numpy covers the everyday ones:

  • sum()

  • mean()

  • std() – standard deviation, ddof= adjusts the divisor (N - ddof)

  • min() / max()

  • median()

  • argmin() / argmax() – the index of the minimum or maximum element

  • all() / any() – truth-value reductions on boolean arrays

9.11.1. Without the axis keyword

Called without axis=, a reduction returns a scalar covering the entire array:

a = np.array([1, 2, 3, 4], dtype=np.float)
np.sum(a)           # 10.0
np.mean(a)          # 2.5
np.std(a)           # 1.118...
np.median(a)        # 2.5

b = np.array([40, 10, 30, 20], dtype=np.float)
np.max(b)           # 40.0
np.argmax(b)        # 0  (index of the maximum)

9.11.2. With the axis keyword

axis= contracts one named axis and leaves the others intact. The result is an array of one rank lower than the input:

m = np.arange(12, dtype=np.float).reshape((3, 4))

np.sum(m)               # 66.0          - scalar
np.sum(m, axis=0)       # length-4      - column sums
np.sum(m, axis=1)       # length-3      - row sums

The same shape rule applies to every reduction: axis=0 collapses the first axis, axis=1 collapses the second, and so on. The keepdims=True keyword keeps the contracted axis in place with length 1, which makes the result safe to broadcast back against the original.

Mean / standard deviation along a row, for example, are written np.mean(m, axis=1) and np.std(m, axis=1). The result has the other axis’s length.

9.11.3. Layout matters

Combined with the row-major layout covered on Shape and strides, reducing along the last axis is the cheapest case. The reduction walks the data block in the direction it is stored, with no jumps from row to row:

m = np.arange(2000, dtype=np.float).reshape((2, 1000))
np.sum(m, axis=1)       # cheap - long axis is the inner one
np.sum(m, axis=0)       # has to jump rows on every step

When the application has a choice about how to lay out a buffer, put the long axis last so reductions along it run in the fast direction.

9.11.4. Iterables as input

Most reductions accept a Python iterable (a list, a range, a tuple) in place of an ndarray. The convenience costs a few microseconds for the implicit conversion – which adds up fast in a loop. When the same data is reduced multiple times, build the ndarray once and pass it around.