This error occurs when you try to merge two DataFrames but the column in one DataFrame is type int64
and the other column is type object
. You can solve this error by converting the column of type object
to int64
using the astype()
method before merging. For example,
import pandas as pd df1 = pd.DataFrame({'year': [2015, 2016, 2017, 2018, 2019, 2020, 2021], 'water_level': [190, 240, 280, 140, 300, 200, 400]}) df2 = pd.DataFrame({'year': ['2015', '2016', '2017', '2018', '2019', '2020', '2021'], 'avg_temp': [31, 24, 21, 40, 20, 45, 15]}) df2['year']=df2['year'].astype(int) merged_df = df1.merge(df2, on='year', how='left')
This tutorial will go through how to solve the error with code examples.
ValueError: You are trying to merge on object and int64 columns
In Python, a value is a piece of information stored within a particular object. We will encounter a ValueError in Python when using a built-in operation or function that receives an argument that is the right type but an inappropriate value. In this specific error, columns are the right type of object to pass to the merge method, but the values in both of the columns have to be the same type.
Example
Let’s look at an example to reproduce the error.
import pandas as pd # Define two DataFrames df1 = pd.DataFrame({'year': [2015, 2016, 2017, 2018, 2019, 2020, 2021], 'water_level': [190, 240, 280, 140, 300, 200, 400]}) df2 = pd.DataFrame({'year': ['2015', '2016', '2017', '2018', '2019', '2020', '2021'], 'avg_temp': [31, 24, 21, 40, 20, 45, 15]}) # Merge on 'year' column merged_df = df1.merge(df2, on='year', how='left')
Let’s run the code to get the result.
ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat
The error occurs because the dtype of the ‘year
‘ column in the first DataFrame is int64
and object
for the second DataFrame. We can verify the dtype of a column using the dtype()
method. For example,
print(df1['year'].dtype) print(df2['year'].dtype)
int64 object
The Pandas dtype object is equivalent to str
in Python.
Solution #1 Convert Int64 Column to Object
We can solve this error by casting the ‘year
‘ column in the second DataFrame from object
to int64
using the astype()
method. Let’s look at the revised code:
import pandas as pd # Define two DataFrames df1 = pd.DataFrame({'year': [2015, 2016, 2017, 2018, 2019, 2020, 2021], 'water_level': [190, 240, 280, 140, 300, 200, 400]}) df2 = pd.DataFrame({'year': ['2015', '2016', '2017', '2018', '2019', '2020', '2021'], 'avg_temp': [31, 24, 21, 40, 20, 45, 15]}) # Cast column to int64 df2['year']=df2['year'].astype(int) print(df1['year'].dtype) print(df2['year'].dtype)
int64 int64
Now that both columns are int64
we can merge the two DataFrames.
# Merge on 'year' column merged_df = df1.merge(df2, on='year', how='left') print(merged_df)
Let’s run the code to get the merged DataFrame:
year water_level avg_temp 0 2015 190 31 1 2016 240 24 2 2017 280 21 3 2018 140 40 4 2019 300 20 5 2020 200 45 6 2021 400 15
Solution #2 Convert Object Column to Int64
We can also solve the error by casting the ‘year
‘ column in the first DataFrame from int64
to object using the astype()
method. Let’s look at the revised code:
import pandas as pd df1 = pd.DataFrame({'year': [2015, 2016, 2017, 2018, 2019, 2020, 2021], 'water_level': [190, 240, 280, 140, 300, 200, 400]}) df2 = pd.DataFrame({'year': ['2015', '2016', '2017', '2018', '2019', '2020', '2021'], 'avg_temp': [31, 24, 21, 40, 20, 45, 15]}) df1['year']=df1['year'].astype(str) print(df1['year'].dtype) print(df2['year'].dtype)
object object
Now that both columns are object
we can merge the two DataFrames.
merged_df = df1.merge(df2, on='year', how='left') print(merged_df)
Let’s run the code to get the merged DataFrame
year water_level avg_temp 0 2015 190 31 1 2016 240 24 2 2017 280 21 3 2018 140 40 4 2019 300 20 5 2020 200 45 6 2021 400 15
Summary
Congratulations on reading to the end of this tutorial!
For further reading on errors involving Pandas, go to the articles:
- How to Solve Pandas TypeError: empty ‘dataframe’ no numeric data to plot.
- How to Solve Python ValueError: Can only compare identically-labeled DataFrame objects
- How to Solve Python ValueError: cannot set a row with mismatched columns
- How to Solve Python ValueError: Index contains duplicate entries, cannot reshape
To learn more about Python for data science and machine learning, go to the online courses page on Python for the most comprehensive courses available.
Have fun and happy researching!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.