How to Solve TypeError: Cannot perform ‘rand_’ with a dtyped [object] array and scalar of type [bool]

by | Programming, Python, Tips

If you try to filter a pandas DataFrame using more than one expression but do not use parentheses around each expression you will raise the TypeError: Cannot perform ‘rand_’ with a dtyped [object] array and scalar of type [bool].

To solve this error, ensure that you put parenthesis around each condition, for example,

df.loc[(df.column1 == 'A') & (df.column2 > 5)]

This tutorial will go through the error in detail and how to solve it with code examples.

This error is similar to TypeError: Cannot perform ‘ror_’ with a dtyped [object] array and scalar of type [bool] and TypeError: Cannot perform ‘rand_’ with a dtyped [int64] array and scalar of type [bool] which are also discussed.


TypeError: Cannot perform ‘rand_’ with a dtyped [object] array and scalar of type [bool]

Let’s break up the error message to understand what the error means. TypeError occurs whenever we attempt to use an illegal operation for a specific data type. In this case, the operation we are trying to perform is logical AND (rand_) or logical OR (ror_). The process of filtering data through logical conditions is called Boolean indexing. Each expression used for filtering must be wrapped in parentheses. If not, otherwise you may perform logical operations with invalid operands, which will raise the TypeError.

Example: Cannot perform ‘rand_’ with a dtyped [object] array and scalar of type [bool]

Let’s look at an example of a DataFrame containing three columns.

import pandas as pd

df = pd.DataFrame({'category_1': ['X', 'X', 'X', 'Y', 'Y', 'Y', 'Z', 'Z', 'Z'],
'category_2':['A', 'A', 'C', 'B', 'A', 'D', 'B', 'A', 'D'],
'values':[12, 30, 44, 50, 7, 100, 89, 5, 10]})

print(df)
  category_1 category_2  values
0          X          A      12
1          X          A      30
2          X          C      44
3          Y          B      50
4          Y          A       7
5          Y          D     100
6          Z          B      89
7          Z          A       5
8          Z          D      10

We want to get the rows that satisfy the condition of having a value of X in the category_1 column and a value of A in the category_2. We can use the logical AND operator & to filter the DataFrame.

rows_match = df.loc[df.category_1 == 'X' & df.category_2 == 'A']
print(rows_match)

Let’s run the code to see what happens:

TypeError: Cannot perform 'rand_' with a dtyped [object] array and scalar of type [bool]

The error occurs because the logic operator & has higher precedence over the comparison operator ==. Therefore the above code is equivalent to df.category_1 == ('X' & df_category_2 == 'A'). The Type error refers to trying to perform a logical AND operation between 'X' which is a string or object data type in Pandas and df_category_2 == 'A' which is a boolean.

Solution

We can solve this error by wrapping each of the two comparison expressions inside a pair of parentheses. Let’s look at the revised code:

rows_match = df.loc[(df.category_1 == 'X') & (df.category_2 == 'A')]
print(rows_match)

Let’s run the code to get the result:

  category_1 category_2  values
0          X          A      12
1          X          A      30

We successfully filtered the DataFrame using the logical AND of two comparison expressions.

Example: Cannot perform ‘ror_’ with a dtyped [object] array and scalar of type [bool]

Let’s look at the same DataFrame, but this time we want to use the logical OR operation.

import pandas as pd

df = pd.DataFrame({'category_1': ['X', 'X', 'X', 'Y', 'Y', 'Y', 'Z', 'Z', 'Z'],
'category_2':['A', 'A', 'C', 'B', 'A', 'D', 'B', 'A', 'D'],
'values':[12, 30, 44, 50, 7, 100, 89, 5, 10]})

print(df)
rows_match = df.loc[df.category_1 == 'X' | df.category_2 == 'A']
print(rows_match)

We want to get the rows that satisfy the condition of having a value of X in the category_1 column or a value of A in the category_2. Let’s run the code to see what happens.

TypeError: Cannot perform 'ror_' with a dtyped [object] array and scalar of type [bool]

The error occurs because the logic operator | has higher precedence over the comparison operator ==. Therefore the above code is equivalent to df.category_1 == ('X' | df_category_2 == 'A'). The Type error refers to trying to perform a logical OR operation between ‘X’ which is a string or object data type in Pandas and df_category_2 == 'A' which is a boolean.

Solution

We can solve this error by wrapping each of the two comparison expressions inside a pair of parentheses. Let’s look at the revised code:

rows_match = df.loc[(df.category_1 == 'X') | (df.category_2 == 'A')]
print(rows_match)

Let’s run the code to get the result:

  category_1 category_2  values
0          X          A      12
1          X          A      30
2          X          C      44
4          Y          A       7
7          Z          A       5

We successfully filtered the DataFrame using the logical AND of two comparison expressions.

Example: Cannot perform ‘rand_’ with a dtyped [int64] array and scalar of type [bool]

Let’s look at the same DataFrame but in this case, we want to use three expressions to filter the rows.

rows_match = df.loc[(df.category_1 == 'X') | (df.category_1 == 'X' & df['values'] > 5)]
print(rows_match)

In the above code, we are filtering the rows that satisfy the condition of df.category_1 == 'X' or df.category_1 == 'X' and df['values'] > 5. Note that we have used parentheses on either side of the logical OR operator. Let’s run the code to see the result.

TypeError: Cannot perform 'rand_' with a dtyped [int64] array and scalar of type [bool]

The error occurs because the logic operator & has higher precedence over the comparison operator >. Therefore the right operand is equivalent to df.category_1 == (df['values'] & df_category_1 == 'X') > 5. The Type error refers to trying to perform a logical AND operation between df['values'] which is an int64 array and df_category_1 == 'X' which is a boolean.

Solution

To solve this error we need to ensure that we wrap each expression in parentheses. Let’s look at the revised code:

rows_match = df.loc[(df.category_1 == 'X') | ((df.category_1 == 'X') & (df['values'] >5))]
print(rows_match)

Let’s run the code to see the result:

  category_1 category_2  values
0          X          A      12
1          X          A      30
2          X          C      44

We successfully filtered the DataFrame using the logical AND of two comparison expressions.

Summary

Congratulations on reading to the end of this tutorial! Cannot perform ‘rand_’ with a dtyped [object] array and scalar of type [bool] occurs when you have expressions that are not wrapped in parenthesis. To solve this error, you must wrap each expression in parentheses because the logical operators take precedence over the comparison operators.

The solution is the same whether it is a logical AND (rand_) or logical OR (ror_) operation.

For further reading on Pandas, go to the article: Introduction to Pandas: A Complete Tutorial for Beginners.

Have fun and happy researching!

Profile Picture
Senior Advisor, Data Science | [email protected] | + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.

Buy Me a Coffee