6.12. Selection and rearrangement¶
Reductions collapsed an array down to a scalar or a lower-rank result. Selection covers the operations that pick which elements survive and where they end up: conditional choice, clipping, sorting, looking up indices, reordering along an axis.
6.12.1. Conditional choice¶
where() returns an array that takes elements from x where the condition is truthy and from y otherwise. The three operands broadcast together:
a = np.array([1, 2, 3, 4, 5], dtype=np.float)
np.where(a < 3, a, 0.0)
# array([1.0, 2.0, 0.0, 0.0, 0.0])
This is the right tool for an “if/else per element” without writing a Python loop.
clip() is shorthand for maximum(lo, minimum(a, hi)) – saturate the values to a range:
np.clip(a, 2.0, 4.0)
# array([2.0, 2.0, 3.0, 4.0, 4.0])
maximum() and minimum() take two operands and return the element-wise larger / smaller:
np.maximum(a, 3.0)
np.minimum(a, np.array([5, 4, 3, 2, 1]))
6.12.2. Finding indices¶
nonzero() returns the coordinates of every non-zero element, split into one index array per dimension. For a 2-D input the result is a tuple of two arrays: the first holds the row indices, the second holds the column indices. Pairing them column-wise gives the (row, col) of each non-zero position:
m = np.array([[0, 2, 0],
[3, 0, 0]], dtype=np.float)
np.nonzero(m)
# (array([0, 1], dtype=uint16), array([1, 0], dtype=uint16))
The non-zero entries in m are m[0, 1] = 2 and m[1, 0] = 3. The first returned array [0, 1] gives their row indices; the second [1, 0] gives their column indices. Reading the two arrays side by side recovers the positions (0, 1) and (1, 0).
Two reductions also produce indices:
argmin()/argmax()– index of the smallest / largest element.argsort()– an integer array that would sort the input along the given axis (defaults to the last):a = np.array([40, 10, 30, 20], dtype=np.uint8) idx = np.argsort(a) # array([1, 3, 2, 0], dtype=uint16) a[idx] # array([10, 20, 30, 40])
argsortalways returnsuint16; the array being sorted must therefore have no more than 65,535 elements on the sorted axis.
bincount() counts occurrences of each non-negative integer in a 1-D uint8 / uint16 input:
histogram = np.bincount(np.array([0, 1, 1, 2, 2, 2], dtype=np.uint8))
# array([1, 2, 3], dtype=uint16)
Useful for building histograms of small-integer pixel values without writing a Python loop.
6.12.3. Sorting and reordering¶
sort() returns a sorted copy of the array along the given axis (the last by default). Use sort() on the array directly for an in-place version:
np.sort(np.array([3, 1, 2], dtype=np.float))
# array([1.0, 2.0, 3.0])
flip() reverses the order along the given axis (every axis when no axis is passed):
np.flip(np.array([1, 2, 3, 4]))
# array([4, 3, 2, 1])
roll() cyclically shifts elements by the given count. Useful for implementing a ring-buffer-style shift register:
np.roll(np.array([1, 2, 3, 4]), 1)
# array([4, 1, 2, 3])
take() is the explicit form of fancy indexing – pick elements at arbitrary indices:
a = np.array([10, 20, 30, 40, 50], dtype=np.uint8)
np.take(a, [0, 2, 4])
# array([10, 30, 50], dtype=uint8)
6.12.4. Filtering and structural edits¶
compress() is the explicit form of boolean indexing – return the slices of a selected by the boolean condition:
a = np.array([10, 20, 30, 40], dtype=np.uint8)
np.compress(a > 15, a)
# array([20, 30, 40], dtype=uint8)
delete() returns a copy with the entries at the given indices removed:
a = np.array([10, 20, 30, 40, 50], dtype=np.uint8)
np.delete(a, [1, 3])
# array([10, 30, 50], dtype=uint8)
diff() returns the n-th discrete forward difference of the array along an axis. Used to compute first-order changes between adjacent samples:
samples = np.array([1, 3, 6, 10, 15], dtype=np.float)
np.diff(samples)
# array([2.0, 3.0, 4.0, 5.0])
6.12.5. What each operation costs¶
Almost every function on this page returns a freshly allocated array. Two exceptions:
sort()sorts in place; the free functionsort()returns a sorted copy.take()accepts anout=keyword to write into a buffer that already exists.
In a loop that runs many times a second, prefer the in-place sort() and reuse pre-allocated buffers everywhere else. Boolean masks themselves are allocated every time the comparison runs – build a mask once and reuse it across operations rather than rebuilding it inside every iteration.