Select Page

How to Solve Python ValueError: Can only compare identically-labeled DataFrame objects

by | Programming, Python, Tips

If you try to compare DataFrames with different indexes using the equality comparison operator ==, you will raise the ValueError: Can only compare identically-labeled DataFrame objects. You can solve this error by using equals instead of ==.

For example, df1.equals(df2), which ignores the indexes.

Alternatively, you can use reset_index to reset the indexes back to the default 0, 1, 2, ... For example, df1.reset_index(drop=True).equals(df2.reset_index(drop=True)).

This tutorial will go through the error find detail and how to solve it with code examples.


ValueError: Can only compare identically-labeled DataFrame objects

In Python, a value is a piece of information stored within a particular object. We will encounter a ValueError in Python when using a built-in operation or function that receives an argument that is the right type but an inappropriate value. The data we want to compare is the correct type, DataFrame, but the DataFrames have the inappropriate indexes for comparison.

Example

Let’s look at an example of two DataFrames that we want to compare. Each DataFrame contains the bodyweight and maximum bench presses in kilograms for six lifters. The indexes for the two DataFrames are different.

import pandas as pd

df1 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56],
'Bench press (kg)':[135, 150, 170, 140, 180, 155]},
index = ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6'])

df2 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56],
'Bench press (kg)':[145, 120, 180, 220, 175, 110]},
index = ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F'])

print(df1)

print(df2)

Let’s run this part of the program to see the DataFrames:

    Bodyweight (kg)  Bench press (kg)
lifter_1               76               135
lifter_2               84               150
lifter_3               93               170
lifter_4              106               140
lifter_5              120               180
lifter_6               56               155
          Bodyweight (kg)  Bench press (kg)
lifter_A               76               145
lifter_B               84               120
lifter_C               93               180
lifter_D              106               220
lifter_E              120               175
lifter_F               56               110e

Let’s compare the DataFrames using the equality operator:

print(df1 == df2)

Let’s run the code to see the result:

ValueError: Can only compare identically-labeled DataFrame objects

The ValueError occurs because the first DataFrame has indexes: ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6'] and the second DataFrame has indexes: ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F'].

Solution #1: Use DataFrame.equals

To solve this error, we can use the DataFrame.equals function. The equals function allows us compare two Series or DataFrames to see if they have the same shape or elements. Let’s look at the revised code:

print(df1.equals(df2))

Let’s run the code to see the result:

False

Solution #2: Use DataFrame.equals with DataFrame.reset_index()

We can drop the indexes of the DataFrames using the reset_index() method, then we can compare the DataFrames. To drop the indexes, we need to set the parameter drop = True. Let’s look at the revised code:

df1 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93, 106, 120, 56],
'Bench press (kg)':[145, 120, 180, 220, 175, 110]},
index = ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6'])

df2 = pd.DataFrame({'Bodyweight (kg)':[76, 84, 93, 106, 120, 56],
'Bench press (kg)':[145, 120, 180, 220, 175, 110]},
index = ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F'])

df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
print(df1)
print(df2)

Let’s look at the DataFrames with their indexes dropped:

   Bodyweight (kg)  Bench press (kg)
0               76               145
1               84               120
2               93               180
3              106               220
4              120               175
5               56               110
   Bodyweight (kg)  Bench press (kg)
0               76               145
1               84               120
2               93               180
3              106               220
4              120               175
5               56               110

There are two ways we can compare the DataFrames:

  • The whole DataFrame
  • Row-by-row comparison

Entire DataFrame Comparison

We can use the equals() method to see if all elements are the same in both DataFrame objects. Let’s look at the code:

print(df1.equals(df2))

Let’s run the code to see the result:

True

Row-by-Row DataFrame Comparison

We can check that individual rows are equal using the equality operator once the DataFrames indexes are reset. Let’s look at the code:

print(df1 == df2)

Let’s run the code to see the result:

   Bodyweight (kg)  Bench press (kg)
0             True              True
1             True              True
2             True              True
3             True              True
4             True              True
5             True              True

Note that the comparison is done row-wise for each column independently.

Solution #3: Use numpy.array_equal

We can also use numpy.array_equal to check if two arrays have the same shape and elements. We can extract arrays from the DataFrame using .values. Let’s look at the revised code:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56],
'Bench press (kg)':[135, 150, 170, 140, 180, 155]},
index = ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6'])

df2 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56],
'Bench press (kg)':[145, 120, 180, 220, 175, 110]},
index = ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F'])

print(np.array_equal(df1.values, df2.values))

Let’s run the code to see the result:

False

We can use array_equal to compare individual columns. Let’s look at the revised code:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56],
'Bench press (kg)':[135, 150, 170, 140, 180, 155]},
index = ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6'])

df2 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56],
'Bench press (kg)':[145, 120, 180, 220, 175, 110]},
index = ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F'])

# Get individual columns of DataFrames using iloc
df1_bodyweight = df1.iloc[:,0]
df1_bench = df1.iloc[:,1]

df2_bodyweight = df2.iloc[:,0]
df2_bench = df2.iloc[:,1]

# Compare bodyweight and bench columns separately 

print(np.array_equal(df1_bodyweight.values, df2_bodyweight.values))
print(np.array_equal(df1_bench.values, df2_bench.values))

Let’s run the code to see the result:

True
False

The above result informs us that the first column contains the same elements between the two DataFrames, the second column contains different elements between the two DataFrames.

Summary

Congratulations on reading to the end of this tutorial! The ValueError: Can only compare identically-labeled DataFrame objects occurs when trying to compare two DataFrames with different indexes. You can either reset the indexes using reset_index() or use the equals() function which ignores the indexes. You can also use the NumPy method array_equal to compare the two DataFrames’ columns.

For further reading on errors involving Pandas, go to the articles:

For further reading on Pandas, go to the article: Introduction to Pandas: A Complete Tutorial for Beginners.

Have fun and happy researching

Research Scientist at Moogsoft | + posts

Suf is a research scientist at Moogsoft, specializing in Natural Language Processing and Complex Networks. Previously he was a Postdoctoral Research Fellow in Data Science working on adaptations of cutting-edge physics analysis techniques to data-intensive problems in industry. In another life, he was an experimental particle physicist working on the ATLAS Experiment of the Large Hadron Collider. His passion is to share his experience as an academic moving into industry while continuing to pursue research. Find out more about the creator of the Research Scientist Pod here and sign up to the mailing list here!