If you try to compare DataFrames with different indexes using the equality comparison operator ==
, you will raise the ValueError: Can only compare identically-labeled DataFrame objects. You can solve this error by using equals instead of ==.
For example, df1.equals(df2)
, which ignores the indexes.
Alternatively, you can use reset_index
to reset the indexes back to the default 0, 1, 2, ...
For example, df1.reset_index(drop=True).equals(df2.reset_index(drop=True))
.
This tutorial will go through the error find detail and how to solve it with code examples.
ValueError: Can only compare identically-labeled DataFrame objects
In Python, a value is a piece of information stored within a particular object. We will encounter a ValueError in Python when using a built-in operation or function that receives an argument that is the right type but an inappropriate value. The data we want to compare is the correct type, DataFrame, but the DataFrames have the inappropriate indexes for comparison.
Example
Let’s look at an example of two DataFrames that we want to compare. Each DataFrame contains the bodyweight and maximum bench presses in kilograms for six lifters. The indexes for the two DataFrames are different.
import pandas as pd df1 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56], 'Bench press (kg)':[135, 150, 170, 140, 180, 155]}, index = ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6']) df2 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56], 'Bench press (kg)':[145, 120, 180, 220, 175, 110]}, index = ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F']) print(df1) print(df2)
Let’s run this part of the program to see the DataFrames:
Bodyweight (kg) Bench press (kg) lifter_1 76 135 lifter_2 84 150 lifter_3 93 170 lifter_4 106 140 lifter_5 120 180 lifter_6 56 155 Bodyweight (kg) Bench press (kg) lifter_A 76 145 lifter_B 84 120 lifter_C 93 180 lifter_D 106 220 lifter_E 120 175 lifter_F 56 110e
Let’s compare the DataFrames using the equality operator:
print(df1 == df2)
Let’s run the code to see the result:
ValueError: Can only compare identically-labeled DataFrame objects
The ValueError occurs because the first DataFrame has indexes: ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6']
and the second DataFrame has indexes: ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F']
.
Solution #1: Use DataFrame.equals
To solve this error, we can use the DataFrame.equals function. The equals function allows us compare two Series or DataFrames to see if they have the same shape or elements. Let’s look at the revised code:
print(df1.equals(df2))
Let’s run the code to see the result:
False
Solution #2: Use DataFrame.equals with DataFrame.reset_index()
We can drop the indexes of the DataFrames using the reset_index()
method, then we can compare the DataFrames. To drop the indexes, we need to set the parameter drop = True
. Let’s look at the revised code:
df1 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93, 106, 120, 56], 'Bench press (kg)':[145, 120, 180, 220, 175, 110]}, index = ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6']) df2 = pd.DataFrame({'Bodyweight (kg)':[76, 84, 93, 106, 120, 56], 'Bench press (kg)':[145, 120, 180, 220, 175, 110]}, index = ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F']) df1 = df1.reset_index(drop=True) df2 = df2.reset_index(drop=True) print(df1) print(df2)
Let’s look at the DataFrames with their indexes dropped:
Bodyweight (kg) Bench press (kg) 0 76 145 1 84 120 2 93 180 3 106 220 4 120 175 5 56 110 Bodyweight (kg) Bench press (kg) 0 76 145 1 84 120 2 93 180 3 106 220 4 120 175 5 56 110
There are two ways we can compare the DataFrames:
- The whole DataFrame
- Row-by-row comparison
Entire DataFrame Comparison
We can use the equals()
method to see if all elements are the same in both DataFrame objects. Let’s look at the code:
print(df1.equals(df2))
Let’s run the code to see the result:
True
Row-by-Row DataFrame Comparison
We can check that individual rows are equal using the equality operator once the DataFrames indexes are reset. Let’s look at the code:
print(df1 == df2)
Let’s run the code to see the result:
Bodyweight (kg) Bench press (kg) 0 True True 1 True True 2 True True 3 True True 4 True True 5 True True
Note that the comparison is done row-wise for each column independently.
Solution #3: Use numpy.array_equal
We can also use numpy.array_equal to check if two arrays have the same shape and elements. We can extract arrays from the DataFrame using .values. Let’s look at the revised code:
import pandas as pd import numpy as np df1 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56], 'Bench press (kg)':[135, 150, 170, 140, 180, 155]}, index = ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6']) df2 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56], 'Bench press (kg)':[145, 120, 180, 220, 175, 110]}, index = ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F']) print(np.array_equal(df1.values, df2.values))
Let’s run the code to see the result:
False
We can use array_equal to compare individual columns. Let’s look at the revised code:
import pandas as pd import numpy as np df1 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56], 'Bench press (kg)':[135, 150, 170, 140, 180, 155]}, index = ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6']) df2 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56], 'Bench press (kg)':[145, 120, 180, 220, 175, 110]}, index = ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F']) # Get individual columns of DataFrames using iloc df1_bodyweight = df1.iloc[:,0] df1_bench = df1.iloc[:,1] df2_bodyweight = df2.iloc[:,0] df2_bench = df2.iloc[:,1] # Compare bodyweight and bench columns separately print(np.array_equal(df1_bodyweight.values, df2_bodyweight.values)) print(np.array_equal(df1_bench.values, df2_bench.values))
Let’s run the code to see the result:
True False
The above result informs us that the first column contains the same elements between the two DataFrames, the second column contains different elements between the two DataFrames.
Summary
Congratulations on reading to the end of this tutorial! The ValueError: Can only compare identically-labeled DataFrame objects occurs when trying to compare two DataFrames with different indexes. You can either reset the indexes using reset_index()
or use the equals()
function which ignores the indexes. You can also use the NumPy method array_equal to compare the two DataFrames’ columns.
For further reading on errors involving Pandas, go to the articles:
- How to Solve Pandas TypeError: empty ‘dataframe’ no numeric data to plot
- How to Solve Python ValueError: You are trying to merge on object and int64 columns
For further reading on Pandas, go to the article: Introduction to Pandas: A Complete Tutorial for Beginners.
Have fun and happy researching
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.