This error occurs when you try to access vectorized string methods using str on a pandas DataFrame instead of a pandas Series. Series.str() provides vectorized string functions for Series and Index.
To solve this error, ensure that when you are assigning column names to the DataFrame that you do not put square brackets around the column names list.
This tutorial will go through the error in detail and how to solve it with code examples.
Table of contents
AttributeError: ‘dataframe’ object has no attribute ‘str’
AttributeError occurs in a Python program when we try to access an attribute (method or property) that does not exist for a particular object. The part ‘DataFrame’ object has no attribute ‘str’‘ tells us that the DataFrame object we are handling does not have the str attribute. str is a Series and Index attribute. We can get a Series from a DataFrame by referring to a column name or using values. Let’s look at an example:
Get a Series from a DataFrame
import pandas as pd data = [['Jim', 21], ['Patrice', 45], ['Louise', 19]] df = pd.DataFrame(data, columns = ['Name', 'Age']) names = df['Name'] type(df) type(names)
pandas.core.frame.DataFrame pandas.core.series.Series
We can access the str attribute with the names variable but not the df variable.
names.str.replace('Patrice', 'Ulysses') print(names) df.str.replace('Patrice', 'Ulysses') print(df)
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Input In [22], in <cell line: 3>() 1 names.str.replace('Patrice', 'Ulysses') 2 print(names) ----> 3 df.str.replace('Patrice', 'Ulysses') 4 print(df) File ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py:5583, in NDFrame.__getattr__(self, name) 5576 if ( 5577 name not in self._internal_names_set 5578 and name not in self._metadata 5579 and name not in self._accessors 5580 and self._info_axis._can_hold_identifiers_and_holds_name(name) 5581 ): 5582 return self[name] -> 5583 return object.__getattribute__(self, name) AttributeError: 'DataFrame' object has no attribute 'str'
Example
Consider the following CSV file, new_pizzas.csv
:
margherita,£7.99 pepperoni,£8.99 four cheeses,£10.99 funghi,£8.99 tartufo,£14.99 porcino,£11.75 vegetarian,£10.99
We will read the CSV into a DataFrame using pandas.read_csv
and then attempt to extract a specific pizza based on its name.
import pandas as pd df = pd.read_csv('new_pizzas.csv') df
margherita £7.99 0 pepperoni £8.99 1 four cheeses £10.99 2 funghi £8.99 3 tartufo £14.99 4 porcino £11.75 5 vegetarian £10.99
The DataFrame needs to have column names. We can set the column names as follows:
headerNames = ["pizza", "price"] df.columns = [headerNames]
We defined a list of column names and assigned the list to df.columns
, which are the column labels of the DataFrame.
Next, we will try to find the pizzas in the DataFrame that contain the subword “veg
“.
veg_pizza = df.loc[df['pizza'].str.contains('veg')]
Let’s run the code to see what happens:
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Input In [10], in <cell line: 1>() ----> 1 veg_pizza = df.loc[df['pizza'].str.contains('veg')] File ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py:5583, in NDFrame.__getattr__(self, name) 5576 if ( 5577 name not in self._internal_names_set 5578 and name not in self._metadata 5579 and name not in self._accessors 5580 and self._info_axis._can_hold_identifiers_and_holds_name(name) 5581 ): 5582 return self[name] -> 5583 return object.__getattribute__(self, name) AttributeError: 'DataFrame' object has no attribute 'str'
The error occurs because we put the headerNames
variable in square brackets, which creates a MultiIndex object instead of an Index object. Therefore df.columns
is a MultiIndex, not an Index.
type(df.columns)
pandas.core.indexes.multi.MultiIndex
Therefore, when df['pizza']
returns a DataFrame instead of a series, and DataFrame does not have str as an attribute.
type(df['pizza'])
pandas.core.frame.DataFrame
Solution
We can solve the error by removing the square brackets around headerNames
, which results in assigning an Index object to df.columns
.
headerNames = ["pizza", "price"] df.columns = headerNames type(df.columns)
pandas.core.indexes.base.Index
Therefore, df['pizza']
will be a Series, not a DataFrame.
type(df['pizza'])
pandas.core.series.Series
Let’s run the code with the changes:
veg_pizza = df.loc[df['pizza'].str.contains('veg')] veg_pizza
pizza price 5 vegetarian £10.99
We successfully extracted the row that satisfies the condition of the pizza name containing the substring “veg
“.
Summary
Congratulations on reading to the end of this tutorial!
To learn more about Python for data science and machine learning, go to the online courses page on Python for the most comprehensive courses available.
Have fun and happy researching!