Select Page

How to Solve Pandas AttributeError: ‘DataFrame’ object has no attribute ‘str’

by | Pandas, Programming, Python, Tips

This error occurs when you try to access vectorized string methods using str on a pandas DataFrame instead of a pandas Series. Series.str() provides vectorized string functions for Series and Index.

To solve this error, ensure that when you are assigning column names to the DataFrame that you do not put square brackets around the column names list.

This tutorial will go through the error in detail and how to solve it with code examples.


AttributeError: ‘dataframe’ object has no attribute ‘str’

AttributeError occurs in a Python program when we try to access an attribute (method or property) that does not exist for a particular object. The part ‘DataFrame’ object has no attribute ‘str’‘ tells us that the DataFrame object we are handling does not have the str attribute. str is a Series and Index attribute. We can get a Series from a DataFrame by referring to a column name or using values. Let’s look at an example:

Get a Series from a DataFrame

import pandas as pd
  
data = [['Jim', 21], ['Patrice', 45], ['Louise', 19]]
  
df = pd.DataFrame(data, columns = ['Name', 'Age'])

names = df['Name']

type(df)
type(names)
pandas.core.frame.DataFrame
pandas.core.series.Series

We can access the str attribute with the names variable but not the df variable.

names.str.replace('Patrice', 'Ulysses')
print(names)
df.str.replace('Patrice', 'Ulysses')
print(df)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [22], in <cell line: 3>()
      1 names.str.replace('Patrice', 'Ulysses')
      2 print(names)
----> 3 df.str.replace('Patrice', 'Ulysses')
      4 print(df)

File ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py:5583, in NDFrame.__getattr__(self, name)
   5576 if (
   5577     name not in self._internal_names_set
   5578     and name not in self._metadata
   5579     and name not in self._accessors
   5580     and self._info_axis._can_hold_identifiers_and_holds_name(name)
   5581 ):
   5582     return self[name]
-> 5583 return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'str'

Example

Consider the following CSV file, new_pizzas.csv:

margherita,£7.99
pepperoni,£8.99
four cheeses,£10.99
funghi,£8.99
tartufo,£14.99
porcino,£11.75
vegetarian,£10.99

We will read the CSV into a DataFrame using pandas.read_csv and then attempt to extract a specific pizza based on its name.

import pandas as pd

df = pd.read_csv('new_pizzas.csv')

df
 margherita   £7.99
0     pepperoni   £8.99
1  four cheeses  £10.99
2        funghi   £8.99
3       tartufo  £14.99
4       porcino  £11.75
5    vegetarian  £10.99

The DataFrame needs to have column names. We can set the column names as follows:

headerNames = ["pizza", "price"]

df.columns = [headerNames]

We defined a list of column names and assigned the list to df.columns, which are the column labels of the DataFrame.

Next, we will try to find the pizzas in the DataFrame that contain the subword “veg“.

veg_pizza = df.loc[df['pizza'].str.contains('veg')]

Let’s run the code to see what happens:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [10], in <cell line: 1>()
----> 1 veg_pizza = df.loc[df['pizza'].str.contains('veg')]

File ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py:5583, in NDFrame.__getattr__(self, name)
   5576 if (
   5577     name not in self._internal_names_set
   5578     and name not in self._metadata
   5579     and name not in self._accessors
   5580     and self._info_axis._can_hold_identifiers_and_holds_name(name)
   5581 ):
   5582     return self[name]
-> 5583 return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'str'

The error occurs because we put the headerNames variable in square brackets, which creates a MultiIndex object instead of an Index object. Therefore df.columns is a MultiIndex, not an Index.

type(df.columns)
pandas.core.indexes.multi.MultiIndex

Therefore, when df['pizza'] returns a DataFrame instead of a series, and DataFrame does not have str as an attribute.

type(df['pizza'])
pandas.core.frame.DataFrame

Solution

We can solve the error by removing the square brackets around headerNames, which results in assigning an Index object to df.columns.

headerNames = ["pizza", "price"]

df.columns = headerNames

type(df.columns)
pandas.core.indexes.base.Index

Therefore, df['pizza'] will be a Series, not a DataFrame.

type(df['pizza'])
pandas.core.series.Series

Let’s run the code with the changes:

veg_pizza = df.loc[df['pizza'].str.contains('veg')]

veg_pizza
        pizza   price
5  vegetarian  £10.99

We successfully extracted the row that satisfies the condition of the pizza name containing the substring “veg“.

Summary

Congratulations on reading to the end of this tutorial!

To learn more about Python for data science and machine learning, go to the online courses page on Python for the most comprehensive courses available.

Have fun and happy researching!

Research Scientist at Moogsoft | + posts

Suf is a research scientist at Moogsoft, specializing in Natural Language Processing and Complex Networks. Previously he was a Postdoctoral Research Fellow in Data Science working on adaptations of cutting-edge physics analysis techniques to data-intensive problems in industry. In another life, he was an experimental particle physicist working on the ATLAS Experiment of the Large Hadron Collider. His passion is to share his experience as an academic moving into industry while continuing to pursue research. Find out more about the creator of the Research Scientist Pod here and sign up to the mailing list here!