How to Solve Python ValueError: Columns overlap but no suffix specified

by | Programming, Python, Tips

If you try to join together two DataFrames that share one or more column names but do not provide a suffix for either the right or left DataFrame to differentiate the between the columns, you will raise the ValueError: Columns overlap but no suffix specified.

To solve this error, you can use the merge function. For example df1.merge(df2, how = 'left'). Or you can use the join() method and provide a suffix for the left and right DataFrames, for example,

df1.merge(df2, how='left', lsuffix='left', rsuffix='right')

This tutorial will go through the error in detail and how to solve it with code examples.


ValueError: Columns overlap but no suffix specified

In Python, a value is a piece of information stored within a particular object. We will encounter a ValueError in Python when using a built-in operation or function that receives an argument that is the right type but an inappropriate value. In this specific error, the data we use during a join operation is the correct type, DataFrame, but the DataFrames have one or more identical columns without a distinguishing suffix specified.

Example

Let’s look at an example of two DataFrames containing information about particles that we want to join together. The first DataFrame contains the particle names and the masses of each particle. The second DataFrame contains the particle names, the charge, and the spin of each particle. Let’s look at the code:

import pandas as pd

df1 = pd.DataFrame({'particle_name':['electron', 'muon', 'tau', 'W-boson', 'proton', 'Higgs boson'],
'mass (MeV)':[0.51, 106.7, 1776.9, 80433.5, 938.3, 125100]})

df2 = pd.DataFrame({'particle_name':['electron', 'muon', 'tau', 'W-boson', 'proton', 'Higgs boson'],
'charge':[-1, -1, -1, -1, 1, 0],
'spin':[0.5, 0.5, 0.5, 1, 0.5, 0]})

print(df1)
print(df2)

Let’s run the first part of the program to see the DataFrames:

  particle_name  mass (MeV)
0      electron        0.51
1          muon      106.70
2           tau     1776.90
3       W-boson    80433.50
4        proton      938.30
5   Higgs boson   125100.0

  particle_name  charge  spin
0      electron      -1   0.5
1          muon      -1   0.5
2           tau      -1   0.5
3       W-boson      -1   1.0
4        proton       1   0.5
5   Higgs boson       0   0.0

Next, we will try to join the DataFrames using the join() method:

df3= df1.join(df2, how='left')

Let’s run the code to see what happens:

ValueError: columns overlap but no suffix specified: Index(['particle_name'], dtype='object')

The error occurs because df1 and df2 share the particle_name column, and there is no suffix to differentiate between the columns in the new DataFrame df3.

Solution #1: Use merge()

We can solve this error by using the merge() function. Let’s look at the revised code:

df3 = df1.merge(df2)
print(df3)

Let’s run the code to see the result:

  particle_name  mass (MeV)  charge  spin
0      electron        0.51      -1   0.5
1          muon      106.70      -1   0.5
2           tau     1776.90      -1   0.5
3       W-boson    80433.50      -1   1.0
4        proton      938.30       1   0.5
5   Higgs boson   125100.00       0   0.0

We successfully merged the DataFrames. The merge() method drops any value in the common column particle_name for the right DataFrame that already exists in the left DataFrame.

Solution #2: Use join() with lsuffix and rsuffix

We can use the join() method and provide a suffix name for the left and/or right DataFrames. The parameters to set are suffix is lsuffix for the left DataFrame and rsuffix for the right DataFrame. Let’s look at the revised code:

df3 = df1.join(df2, how='left', rsuffix='_2')
print(df3)

In the above code, we set the suffix for the right DataFrame to _2. Let’s run the code to get the result:

  particle_name  mass (MeV) particle_name_2  charge  spin
0      electron        0.51        electron      -1   0.5
1          muon      106.70            muon      -1   0.5
2           tau     1776.90             tau      -1   0.5
3       W-boson    80433.50         W-boson      -1   1.0
4        proton      938.30          proton       1   0.5
5   Higgs boson   125100.00     Higgs boson       0   0.0

We successfully joined the DataFrames using _2 as a suffix for particle_name in the right DataFrame.

Summary

Congratulations on reading to the end of the tutorial! The ValueError: Columns overlap but no suffix provided occurs when you try to join two DataFrames that share one or more columns with no distinguishing suffix provided. To solve this error you can use the merge() method or the join() method with the lsuffix and/or rsuffix parameter values specified.

For further reading on errors involving Pandas, go to the article:

How to Solve Python ValueError: Trailing data

For further reading on Pandas, go to the article: Introduction to Pandas: A Complete Tutorial for Beginners.

Have fun and happy researching!

Profile Picture
Senior Advisor, Data Science | [email protected] | + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.

Buy Me a Coffee