If you try to import a JSON file containing endline separators \n
into a pandas DataFrame, you will encounter ValueError: Trailing data.
To solve this error, you can set the lines parameter in read_json to True, ensuring that each line reads as a JSON object. For example, df = pd.read_json('data.json', line=True)
.
This tutorial will go through how to solve the error with code examples.
Table of contents
ValueError: Trailing data
In Python, a value is a piece of information stored within a particular object. We will encounter a ValueError in Python when using a built-in operation or function that receives an argument that is the right type but an inappropriate value. The data we want to read in this specific error is the correct type, JSON string, but the JSON string contains endline separators, which are inappropriate for the default use of read_json.
Example
Let’s look at an example where we have a JSON containing information about pizzas. The JSON file looks as follows:
{"pizza":"margherita", "price":7.99, "Details":"Contains cheese.\nSuitable for vegetarians"} {"pizza":"pepperoni", "price":9.99, "Details":"Contains meat.\nNot suitable for vegetarians"} {"pizza":"marinara", "price":6.99, "Details":"Dairy free.\nSuitable for vegetarians."} {"pizza":"four cheese", "price":10.99, "Details":"Contains cheese.\nSuitable for vegetarians"} {"pizza":"hawaiian", "price":9.99, "Details":"Contains meat.\nNot suitable for vegetarians"}
We can import the JSON file into pandas DataFrame using the read_json
method. Let’s look at the code:
import pandas as pd df = pd.read_json('sample.json') print(df)
Let’s run the code to see what happens:
ValueError: Trailing data
We raise the ValueError because the Details item in the JSON file contains \n
to indicate a new line.
Solution
We can solve this error by setting lines=True
when calling the read_json
method to read the file as a JSON object per line. Let’s look at the revised code:
import pandas as pd df = pd.read_json('sample.json', lines=True) print(df)
Let’s run the code to see the result:
pizza price Details 0 margherita 7.99 Contains cheese.\nSuitable for vegetarians 1 pepperoni 9.99 Contains meat.\nNot suitable for vegetarians 2 marinara 6.99 Dairy free.\nSuitable for vegetarians. 3 four cheese 10.99 Contains cheese.\nSuitable for vegetarians 4 hawaiian 9.99 Contains meat.\nNot suitable for vegetarians
We can remove the endline separators using str.replace. Let’s look at the code:
df['Details'] = df['Details'].str.replace('\n', ' ') print(df)
Let’s run the code to remove the endline separators:
pizza price Details 0 margherita 7.99 Contains cheese. Suitable for vegetarians 1 pepperoni 9.99 Contains meat. Not suitable for vegetarians 2 marinara 6.99 Dairy free. Suitable for vegetarians. 3 four cheese 10.99 Contains cheese. Suitable for vegetarians 4 hawaiian 9.99 Contains meat. Not suitable for vegetarians
Summary
Congratulations on reading to the end of this tutorial!
For further reading on Pandas, go to the article: Introduction to Pandas: A Complete Tutorial for Beginners.
For further reading on errors involving JSON, go to the articles:
- How to Solve Python JSONDecodeError: extra data
- How to Solve Python JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Have fun and happy researching!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.