How to Solve Pandas AttributeError: ‘DataFrame’ object has no attribute ‘str’

by | Pandas, Programming, Python, Tips

This error occurs when you try to access vectorized string methods using str on a pandas DataFrame instead of a pandas Series. Series.str() provides vectorized string functions for Series and Index.

To solve this error, ensure that when you are assigning column names to the DataFrame that you do not put square brackets around the column names list.

This tutorial will go through the error in detail and how to solve it with code examples.


AttributeError: ‘dataframe’ object has no attribute ‘str’

AttributeError occurs in a Python program when we try to access an attribute (method or property) that does not exist for a particular object. The part ‘DataFrame’ object has no attribute ‘str’‘ tells us that the DataFrame object we are handling does not have the str attribute. str is a Series and Index attribute. We can get a Series from a DataFrame by referring to a column name or using values. Let’s look at an example:

Get a Series from a DataFrame

import pandas as pd
  
data = [['Jim', 21], ['Patrice', 45], ['Louise', 19]]
  
df = pd.DataFrame(data, columns = ['Name', 'Age'])

names = df['Name']

type(df)
type(names)
pandas.core.frame.DataFrame
pandas.core.series.Series

We can access the str attribute with the names variable but not the df variable.

names.str.replace('Patrice', 'Ulysses')
print(names)
df.str.replace('Patrice', 'Ulysses')
print(df)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [22], in <cell line: 3>()
      1 names.str.replace('Patrice', 'Ulysses')
      2 print(names)
----> 3 df.str.replace('Patrice', 'Ulysses')
      4 print(df)

File ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py:5583, in NDFrame.__getattr__(self, name)
   5576 if (
   5577     name not in self._internal_names_set
   5578     and name not in self._metadata
   5579     and name not in self._accessors
   5580     and self._info_axis._can_hold_identifiers_and_holds_name(name)
   5581 ):
   5582     return self[name]
-> 5583 return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'str'

Example

Consider the following CSV file, new_pizzas.csv:

margherita,£7.99
pepperoni,£8.99
four cheeses,£10.99
funghi,£8.99
tartufo,£14.99
porcino,£11.75
vegetarian,£10.99

We will read the CSV into a DataFrame using pandas.read_csv and then attempt to extract a specific pizza based on its name.

import pandas as pd

df = pd.read_csv('new_pizzas.csv')

df
 margherita   £7.99
0     pepperoni   £8.99
1  four cheeses  £10.99
2        funghi   £8.99
3       tartufo  £14.99
4       porcino  £11.75
5    vegetarian  £10.99

The DataFrame needs to have column names. We can set the column names as follows:

headerNames = ["pizza", "price"]

df.columns = [headerNames]

We defined a list of column names and assigned the list to df.columns, which are the column labels of the DataFrame.

Next, we will try to find the pizzas in the DataFrame that contain the subword “veg“.

veg_pizza = df.loc[df['pizza'].str.contains('veg')]

Let’s run the code to see what happens:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [10], in <cell line: 1>()
----> 1 veg_pizza = df.loc[df['pizza'].str.contains('veg')]

File ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py:5583, in NDFrame.__getattr__(self, name)
   5576 if (
   5577     name not in self._internal_names_set
   5578     and name not in self._metadata
   5579     and name not in self._accessors
   5580     and self._info_axis._can_hold_identifiers_and_holds_name(name)
   5581 ):
   5582     return self[name]
-> 5583 return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'str'

The error occurs because we put the headerNames variable in square brackets, which creates a MultiIndex object instead of an Index object. Therefore df.columns is a MultiIndex, not an Index.

type(df.columns)
pandas.core.indexes.multi.MultiIndex

Therefore, when df['pizza'] returns a DataFrame instead of a series, and DataFrame does not have str as an attribute.

type(df['pizza'])
pandas.core.frame.DataFrame

Solution

We can solve the error by removing the square brackets around headerNames, which results in assigning an Index object to df.columns.

headerNames = ["pizza", "price"]

df.columns = headerNames

type(df.columns)
pandas.core.indexes.base.Index

Therefore, df['pizza'] will be a Series, not a DataFrame.

type(df['pizza'])
pandas.core.series.Series

Let’s run the code with the changes:

veg_pizza = df.loc[df['pizza'].str.contains('veg')]

veg_pizza
        pizza   price
5  vegetarian  £10.99

We successfully extracted the row that satisfies the condition of the pizza name containing the substring “veg“.

Summary

Congratulations on reading to the end of this tutorial!

To learn more about Python for data science and machine learning, go to the online courses page on Python for the most comprehensive courses available.

Have fun and happy researching!

Profile Picture
Senior Advisor, Data Science | [email protected] | + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.

Buy Me a Coffee