If you try to join together two DataFrames that share one or more column names but do not provide a suffix for either the right or left DataFrame to differentiate the between the columns, you will raise the ValueError: Columns overlap but no suffix specified.
To solve this error, you can use the merge function. For example df1.merge(df2, how = 'left')
. Or you can use the join()
method and provide a suffix for the left and right DataFrames, for example,
df1.merge(df2, how='left', lsuffix='left', rsuffix='right')
This tutorial will go through the error in detail and how to solve it with code examples.
Table of contents
ValueError: Columns overlap but no suffix specified
In Python, a value is a piece of information stored within a particular object. We will encounter a ValueError in Python when using a built-in operation or function that receives an argument that is the right type but an inappropriate value. In this specific error, the data we use during a join operation is the correct type, DataFrame, but the DataFrames have one or more identical columns without a distinguishing suffix specified.
Example
Let’s look at an example of two DataFrames containing information about particles that we want to join together. The first DataFrame contains the particle names and the masses of each particle. The second DataFrame contains the particle names, the charge, and the spin of each particle. Let’s look at the code:
import pandas as pd df1 = pd.DataFrame({'particle_name':['electron', 'muon', 'tau', 'W-boson', 'proton', 'Higgs boson'], 'mass (MeV)':[0.51, 106.7, 1776.9, 80433.5, 938.3, 125100]}) df2 = pd.DataFrame({'particle_name':['electron', 'muon', 'tau', 'W-boson', 'proton', 'Higgs boson'], 'charge':[-1, -1, -1, -1, 1, 0], 'spin':[0.5, 0.5, 0.5, 1, 0.5, 0]}) print(df1) print(df2)
Let’s run the first part of the program to see the DataFrames:
particle_name mass (MeV) 0 electron 0.51 1 muon 106.70 2 tau 1776.90 3 W-boson 80433.50 4 proton 938.30 5 Higgs boson 125100.0 particle_name charge spin 0 electron -1 0.5 1 muon -1 0.5 2 tau -1 0.5 3 W-boson -1 1.0 4 proton 1 0.5 5 Higgs boson 0 0.0
Next, we will try to join the DataFrames using the join() method:
df3= df1.join(df2, how='left')
Let’s run the code to see what happens:
ValueError: columns overlap but no suffix specified: Index(['particle_name'], dtype='object')
The error occurs because df1
and df2
share the particle_name
column, and there is no suffix to differentiate between the columns in the new DataFrame df3
.
Solution #1: Use merge()
We can solve this error by using the merge()
function. Let’s look at the revised code:
df3 = df1.merge(df2) print(df3)
Let’s run the code to see the result:
particle_name mass (MeV) charge spin 0 electron 0.51 -1 0.5 1 muon 106.70 -1 0.5 2 tau 1776.90 -1 0.5 3 W-boson 80433.50 -1 1.0 4 proton 938.30 1 0.5 5 Higgs boson 125100.00 0 0.0
We successfully merged the DataFrames. The merge()
method drops any value in the common column particle_name
for the right DataFrame that already exists in the left DataFrame.
Solution #2: Use join() with lsuffix and rsuffix
We can use the join()
method and provide a suffix name for the left and/or right DataFrames. The parameters to set are suffix is lsuffix
for the left DataFrame and rsuffix
for the right DataFrame. Let’s look at the revised code:
df3 = df1.join(df2, how='left', rsuffix='_2') print(df3)
In the above code, we set the suffix for the right DataFrame to _2
. Let’s run the code to get the result:
particle_name mass (MeV) particle_name_2 charge spin 0 electron 0.51 electron -1 0.5 1 muon 106.70 muon -1 0.5 2 tau 1776.90 tau -1 0.5 3 W-boson 80433.50 W-boson -1 1.0 4 proton 938.30 proton 1 0.5 5 Higgs boson 125100.00 Higgs boson 0 0.0
We successfully joined the DataFrames using _2
as a suffix for particle_name
in the right DataFrame.
Summary
Congratulations on reading to the end of the tutorial! The ValueError: Columns overlap but no suffix provided occurs when you try to join two DataFrames that share one or more columns with no distinguishing suffix provided. To solve this error you can use the merge()
method or the join()
method with the lsuffix
and/or rsuffix
parameter values specified.
For further reading on errors involving Pandas, go to the article:
How to Solve Python ValueError: Trailing data
For further reading on Pandas, go to the article: Introduction to Pandas: A Complete Tutorial for Beginners.
Have fun and happy researching!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.