This error occurs when you try to add a new row to a DataFrame but the number of values does not match the number of columns in the existing DataFrame.
You can solve this error by ensuring the number of values in the new row matches the number of columns in the DataFrame or by using the append()
method.
This tutorial will go through the error in detail and how to solve it with code examples.
Table of contents
Example
Let’s look at an example to reproduce the error. First, we will create a DataFrame containing the grades of nine students for three subjects.
import pandas as pd # Create DataFrame df = pd.DataFrame({'student': ['john', 'calogero', 'amina', 'clemence', 'george', 'phil', 'albert', 'lizzy', 'paul'], 'biology': [74, 55, 80, 60, 40, 77, 51, 90, 34], 'chemistry': [59, 71, 72, 90, 66, 89, 59, 34, 84], 'physics': [100, 58, 70, 64, 58, 75, 91, 72, 49]}) # View the DataFrame print(df)
Let’s run the code to see the DataFrame:
student biology chemistry physics 0 john 74 59 100 1 calogero 55 71 58 2 amina 80 72 70 3 clemence 60 90 64 4 george 40 66 58 5 phil 77 89 75 6 albert 51 59 91 7 lizzy 90 34 72 8 paul 34 84 49
Next, we will attempt to append a new to the end of the DataFrame.
# Define new row new_student = ['Carmine', 85] # Append row to DataFrame df.loc[len(df)] = new_student # Print updated DataFrame to console print(df)
Let’s run the code to see what happens:
ValueError: cannot set a row with mismatched columns
The error occurs because the new row only contains two values whereas the DataFrame has four columns. We can verify the number of values in the list and the number of columns in a DataFrame using the len()
function. For example,
print(len(new_student)) print(len(df.columns))
2 4
Solution #1
The easiest way to solve the error is to ensure that the number of values in the new row match the number of columns in the DataFrame. The student is missing two grades for chemistry
and physics
. Let’s look at the revised code:
new_student = ['carmine', 85, 58, 93] df.loc[len(df)] = new_student print(df)
Let’s run the code to see the result:
student biology chemistry physics 0 john 74 59 100 1 calogero 55 71 58 2 amina 80 72 70 3 clemence 60 90 64 4 george 40 66 58 5 phil 77 89 75 6 albert 51 59 91 7 lizzy 90 34 72 8 paul 34 84 49 9 carmine 85 58 93
We successfully appended the new row to the DataFrame.
Solution #2
We can also solve the error by using the append()
function. The append()
function will automatically fill in the missing values with NaN
.
Let’s look at the revised code:
# Define new row to append new_student = ['carmine', 85] # Append row to end of DataFrame df = df.append(pd.Series(new_student, index=df.columns[:len(new_student)]), ignore_index=True)
Let’s run the code to get the updated DataFrame:
student biology chemistry physics 0 john 74 59.0 100.0 1 calogero 55 71.0 58.0 2 amina 80 72.0 70.0 3 clemence 60 90.0 64.0 4 george 40 66.0 58.0 5 phil 77 89.0 75.0 6 albert 51 59.0 91.0 7 lizzy 90 34.0 72.0 8 paul 34 84.0 49.0 9 carmine 85 NaN NaN
Summary
Congratulations on reading to the end of this tutorial!
For further reading on errors involving Pandas, go to the articles:
- How to Solve Pandas TypeError: empty ‘dataframe’ no numeric data to plot.
- How to Solve Python ValueError: Can only compare identically-labeled DataFrame objects
- How to Solve ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data)
To learn more about Python for data science and machine learning, go to the online courses page on Python for the most comprehensive courses available.
Have fun and happy researching!