How-to Guide for Python NumPy Where Function

by | Data Science, Programming, Python, Tips

The numpy.where() function returns the elements in two arrays depending on a conditional statement. You can use this function to locate specific elements within an array that match the conditions you specify. We can also perform operations on those elements that satisfy the conditions.

This tutorial will detail the where() function and examples to learn how to implement it.


Introduction to numpy.where()

The where() function accepts a numpy array and returns a new numpy array of boolean values after filtering based on a condition. Let’s look at the syntax for the function.

Syntax for numpy.where()

The syntax for numpy.where() is:

numpy.where(condition, [x, y, ]) 

The condition is an array of booleans that filters the input array. When the value is True yield x; otherwise, yield y.

The optional parameters x and y are arrays that specify the values from which to choose.

The numpy array that the function returns has elements from x where the condition is True, otherwise from y.

Although x and y are optional, if you specify x, you must also specify y because the output array shape must be the same as the input array.

You can see a visualization of the where() function syntax below.

syntax of numpy.where() function
Syntax of numpy.where() function

Examples of numpy.where() Function

Using Relational Operators

Let’s look at an example to select values in an array if they satisfy a relational condition.

import numpy as np

a = np.array([3, 10, 5, 2, 11, 4, 7, 8, 15, 9])

out = np.where((a > 5))

print(out)
print(a[out])
(array([1, 4, 6, 7, 8, 9]),)
[10 11  7  8 15  9]

We can see that the out array has the indices of the array a that satisfy the condition of being greater than 5. If we index the original array with these values, we get the values that satisfy the condition. In this example, we did. not specify the optional parameters x and y. When we provide only the condition, this is shorthand for

np.asarray(condition).nonzero()

Let’s look at an example where we pass x and y to the where() function.

Replace Elements with NumPy where()

Let’s define a 2-dimensional random array and use the where() function to get positive values. We can use the randn() function to fill our array with random values.

import numpy 

#Initialize a 2D array of random values

a = np.random.randn(2, 4)

print(a)

# Use where() function to set the elements of b to a whenever the condition a ≻ 0 is true otheriwise set to 0.

b = np.where(a > 0, a, 0)

print(b)

In the above code, we use the optional parameters x and y, where x is the original array a and y is the value 0. Let’s run the code to see what happens:

[[-0.50850373  0.06457775  0.07061776  0.20135635]
 [ 0.47314891 -2.03258244  1.20819874  0.46498755]]

[[0.         0.06457775 0.07061776 0.20135635]
 [0.47314891 0.         1.20819874 0.46498755]]

We can see that the program retains only the positive elements of the original array.

Using Numpy where() with Multiple Conditions

You can use logical operators to specify multiple conditions. Let’s look at two examples of using the AND (&) operator and the OR (|) operator.

Using Numpy where() with AND Operator

The following code shows how to select every value in a numpy array greater than 7 and less than 20.

import numpy as np

# define numpy array of integers

a = np.array([1, 2, 3, 5, 7, 9, 11, 12, 14, 15, 17, 20, 22, 28])

# select indices of array a with values that meet both conditions

b = np.where((a > 7) & (a < 20))

print(a[b])

Let’s run the code to see what happens:

[ 9 11 12 14 15 17]

We get the values from the array that satisfy the specified conditions.

Using Numpy where() with OR Operator

The following code shows how to select every value in a numpy array less than 7 or greater than 20.

import numpy as np

# define numpy array of integers

a = np.array([1, 2, 3, 5, 7, 9, 11, 12, 14, 15, 17, 20, 22, 28])

# select indices of array a with values that meet both conditions

b = np.where((a < 7) | (a > 20))

print(a[b])

Let’s run the code to see what happens:

[ 1  2  3  5 22 28]

We get the values from the array that satisfy the specified conditions.

We can use the size function to find how many values meet the conditions. Let’s use size in the previous example:

b = np.where((a < 7) | (a > 20))

print(a[b].size)
6

Broadcasting with NumPy where()

If we provide the condition, x, and y arrays, numpy will broadcast them together. Broadcasting involves vectorizing array operations such that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. Let’s look at an example:

import numpy as np

a = np.arange(8).reshape(2,4)

b = np.arange(4).reshape(1, 4)

print(a)

print(b)

# Broadcasts (a < 4, a, and b * 5) of shape (3, 4), (3, 4) and (1,4)

c = np.where(a < 4, a, b * 5)

print(c)

We select the output based on the condition a < 4 and broadcast b to the shape of a.

So b will become:

[[0 1 2 3] [0 1 2 3]]

Let’s run the code to see what happens.

[[0 1 2 3]
 [4 5 6 7]]
[[0 1 2 3]]
[[ 0  1  2  3]
 [ 0  5 10 15]]

The output is the same shape as the array a and contains values either are less than for or are the elements of b multiplied by 5.

Using NumPy where() without any condition expression

We have seen how to pass a condition expression to the where() function in the previous examples. We can also pass a bool array instead of the conditional expression. Let’s look at an example where we define two arrays and then use the where() function to select from either of the two arrays depending on if the value in the boolean array is True or False.

a = [1, 2, 3, 4, 5]

b = [2, 4, 6, 8, 10]

out = np.where([True, True, False, False, True], a, b)

print(out)

The above program selects the element from a if True otherwise, it selects from b. Let’s run the code to see what happens:

[1 2 6 8 5]

We can see that the array matches values according to the values in the boolean array.

Summary

Congratulations on reading to the end of this tutorial! The numpy.where() function provides a way to select elements from two different arrays based on conditions on an input array. Here are some crucial points to remember about numpy.where():

  • You can pass all three arguments or the condition argument only. You cannot pass two arguments to numpy.where()
  • The first array is a boolean array that either comes from evaluating the condition expression or a boolean array passed to the function.
  • If you pass all three arguments to numpy.where(), the three numpy arrays must all be of the same length. Otherwise, you will raise the error: ValueError: operands could not be broadcast together with shapes.

To learn more about getting the index positions of a numpy array, go to the article: How to Solve AttributeError: ‘numpy.ndarray’ object has no attribute ‘index’.

For further reading on handling NumPy arrays, go to the article: How to Reverse a NumPy Array.

Go to the online courses page on Python to learn more about Python for data science and machine learning.

Have fun and happy researching!

Profile Picture
Senior Advisor, Data Science | [email protected] |  + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.

Buy Me a Coffee ✨