6.12. Selection and rearrangement

Reductions collapsed an array down to a scalar or a lower-rank result. Selection covers the operations that pick which elements survive and where they end up: conditional choice, clipping, sorting, looking up indices, reordering along an axis.

6.12.1. Conditional choice

where() returns an array that takes elements from x where the condition is truthy and from y otherwise. The three operands broadcast together:

a = np.array([1, 2, 3, 4, 5], dtype=np.float)
np.where(a < 3, a, 0.0)
# array([1.0, 2.0, 0.0, 0.0, 0.0])

This is the right tool for an “if/else per element” without writing a Python loop.

clip() is shorthand for maximum(lo, minimum(a, hi)) – saturate the values to a range:

np.clip(a, 2.0, 4.0)
# array([2.0, 2.0, 3.0, 4.0, 4.0])

maximum() and minimum() take two operands and return the element-wise larger / smaller:

np.maximum(a, 3.0)
np.minimum(a, np.array([5, 4, 3, 2, 1]))

6.12.2. Finding indices

nonzero() returns the coordinates of every non-zero element, split into one index array per dimension. For a 2-D input the result is a tuple of two arrays: the first holds the row indices, the second holds the column indices. Pairing them column-wise gives the (row, col) of each non-zero position:

m = np.array([[0, 2, 0],
              [3, 0, 0]], dtype=np.float)
np.nonzero(m)
# (array([0, 1], dtype=uint16), array([1, 0], dtype=uint16))

The non-zero entries in m are m[0, 1] = 2 and m[1, 0] = 3. The first returned array [0, 1] gives their row indices; the second [1, 0] gives their column indices. Reading the two arrays side by side recovers the positions (0, 1) and (1, 0).

Two reductions also produce indices:

  • argmin() / argmax() – index of the smallest / largest element.

  • argsort() – an integer array that would sort the input along the given axis (defaults to the last):

    a = np.array([40, 10, 30, 20], dtype=np.uint8)
    idx = np.argsort(a)             # array([1, 3, 2, 0], dtype=uint16)
    a[idx]                          # array([10, 20, 30, 40])
    

    argsort always returns uint16; the array being sorted must therefore have no more than 65,535 elements on the sorted axis.

bincount() counts occurrences of each non-negative integer in a 1-D uint8 / uint16 input:

histogram = np.bincount(np.array([0, 1, 1, 2, 2, 2], dtype=np.uint8))
# array([1, 2, 3], dtype=uint16)

Useful for building histograms of small-integer pixel values without writing a Python loop.

6.12.3. Sorting and reordering

sort() returns a sorted copy of the array along the given axis (the last by default). Use sort() on the array directly for an in-place version:

np.sort(np.array([3, 1, 2], dtype=np.float))
# array([1.0, 2.0, 3.0])

flip() reverses the order along the given axis (every axis when no axis is passed):

np.flip(np.array([1, 2, 3, 4]))
# array([4, 3, 2, 1])

roll() cyclically shifts elements by the given count. Useful for implementing a ring-buffer-style shift register:

np.roll(np.array([1, 2, 3, 4]), 1)
# array([4, 1, 2, 3])

take() is the explicit form of fancy indexing – pick elements at arbitrary indices:

a = np.array([10, 20, 30, 40, 50], dtype=np.uint8)
np.take(a, [0, 2, 4])
# array([10, 30, 50], dtype=uint8)

6.12.4. Filtering and structural edits

compress() is the explicit form of boolean indexing – return the slices of a selected by the boolean condition:

a = np.array([10, 20, 30, 40], dtype=np.uint8)
np.compress(a > 15, a)
# array([20, 30, 40], dtype=uint8)

delete() returns a copy with the entries at the given indices removed:

a = np.array([10, 20, 30, 40, 50], dtype=np.uint8)
np.delete(a, [1, 3])
# array([10, 30, 50], dtype=uint8)

diff() returns the n-th discrete forward difference of the array along an axis. Used to compute first-order changes between adjacent samples:

samples = np.array([1, 3, 6, 10, 15], dtype=np.float)
np.diff(samples)
# array([2.0, 3.0, 4.0, 5.0])

6.12.5. What each operation costs

Almost every function on this page returns a freshly allocated array. Two exceptions:

  • sort() sorts in place; the free function sort() returns a sorted copy.

  • take() accepts an out= keyword to write into a buffer that already exists.

In a loop that runs many times a second, prefer the in-place sort() and reuse pre-allocated buffers everywhere else. Boolean masks themselves are allocated every time the comparison runs – build a mask once and reuse it across operations rather than rebuilding it inside every iteration.