How to Solve Python ValueError: You are trying to merge on object and int64 columns

by | Programming, Python, Tips

This error occurs when you try to merge two DataFrames but the column in one DataFrame is type int64 and the other column is type object. You can solve this error by converting the column of type object to int64 using the astype() method before merging. For example,

import pandas as pd

df1 = pd.DataFrame({'year': [2015, 2016, 2017, 2018, 2019, 2020, 2021],
                    'water_level': [190, 240, 280, 140, 300, 200, 400]})

df2 = pd.DataFrame({'year': ['2015', '2016', '2017', '2018', '2019', '2020', '2021'],
                    'avg_temp': [31, 24, 21, 40, 20, 45, 15]})

df2['year']=df2['year'].astype(int)

merged_df = df1.merge(df2, on='year', how='left')

This tutorial will go through how to solve the error with code examples.


ValueError: You are trying to merge on object and int64 columns

In Python, a value is a piece of information stored within a particular object. We will encounter a ValueError in Python when using a built-in operation or function that receives an argument that is the right type but an inappropriate value. In this specific error, columns are the right type of object to pass to the merge method, but the values in both of the columns have to be the same type.

Example

Let’s look at an example to reproduce the error.

import pandas as pd

# Define two DataFrames

df1 = pd.DataFrame({'year': [2015, 2016, 2017, 2018, 2019, 2020, 2021],
                    'water_level': [190, 240, 280, 140, 300, 200, 400]})

df2 = pd.DataFrame({'year': ['2015', '2016', '2017', '2018', '2019', '2020', '2021'],
                    'avg_temp': [31, 24, 21, 40, 20, 45, 15]})

# Merge on 'year' column

merged_df = df1.merge(df2, on='year', how='left')

Let’s run the code to get the result.

ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat

The error occurs because the dtype of the ‘year‘ column in the first DataFrame is int64 and object for the second DataFrame. We can verify the dtype of a column using the dtype() method. For example,

print(df1['year'].dtype)

print(df2['year'].dtype)
int64
object

The Pandas dtype object is equivalent to str in Python.

Solution #1 Convert Int64 Column to Object

We can solve this error by casting the ‘year‘ column in the second DataFrame from object to int64 using the astype() method. Let’s look at the revised code:

import pandas as pd

# Define two DataFrames

df1 = pd.DataFrame({'year': [2015, 2016, 2017, 2018, 2019, 2020, 2021],
                    'water_level': [190, 240, 280, 140, 300, 200, 400]})

df2 = pd.DataFrame({'year': ['2015', '2016', '2017', '2018', '2019', '2020', '2021'],
                    'avg_temp': [31, 24, 21, 40, 20, 45, 15]})

# Cast column to int64

df2['year']=df2['year'].astype(int)

print(df1['year'].dtype)

print(df2['year'].dtype)
int64
int64

Now that both columns are int64 we can merge the two DataFrames.

# Merge on 'year' column

merged_df = df1.merge(df2, on='year', how='left')

print(merged_df)

Let’s run the code to get the merged DataFrame:

   year  water_level  avg_temp
0  2015          190        31
1  2016          240        24
2  2017          280        21
3  2018          140        40
4  2019          300        20
5  2020          200        45
6  2021          400        15

Solution #2 Convert Object Column to Int64

We can also solve the error by casting the ‘year‘ column in the first DataFrame from int64 to object using the astype() method. Let’s look at the revised code:

import pandas as pd

df1 = pd.DataFrame({'year': [2015, 2016, 2017, 2018, 2019, 2020, 2021],
                    'water_level': [190, 240, 280, 140, 300, 200, 400]})

df2 = pd.DataFrame({'year': ['2015', '2016', '2017', '2018', '2019', '2020', '2021'],
                    'avg_temp': [31, 24, 21, 40, 20, 45, 15]})

df1['year']=df1['year'].astype(str)

print(df1['year'].dtype)
print(df2['year'].dtype)
object
object

Now that both columns are object we can merge the two DataFrames.

merged_df = df1.merge(df2, on='year', how='left')

print(merged_df)

Let’s run the code to get the merged DataFrame

   year  water_level  avg_temp
0  2015          190        31
1  2016          240        24
2  2017          280        21
3  2018          140        40
4  2019          300        20
5  2020          200        45
6  2021          400        15

Summary

Congratulations on reading to the end of this tutorial!

For further reading on errors involving Pandas, go to the articles:

To learn more about Python for data science and machine learning, go to the online courses page on Python for the most comprehensive courses available.

Have fun and happy researching!

Research Scientist at Moogsoft | + posts

Suf is a research scientist at Moogsoft, specializing in Natural Language Processing and Complex Networks. Previously he was a Postdoctoral Research Fellow in Data Science working on adaptations of cutting-edge physics analysis techniques to data-intensive problems in industry. In another life, he was an experimental particle physicist working on the ATLAS Experiment of the Large Hadron Collider. His passion is to share his experience as an academic moving into industry while continuing to pursue research. Find out more about the creator of the Research Scientist Pod here and sign up to the mailing list here!