If you try to import a JSON file containing endline separators \n into a pandas DataFrame, you will encounter ValueError: Trailing data.

To solve this error, you can set the lines parameter in read_json to True, ensuring that each line reads as a JSON object. For example, df = pd.read_json('data.json', line=True).

This tutorial will go through how to solve the error with code examples.


ValueError: Trailing data

In Python, a value is a piece of information stored within a particular object. We will encounter a ValueError in Python when using a built-in operation or function that receives an argument that is the right type but an inappropriate value. The data we want to read in this specific error is the correct type, JSON string, but the JSON string contains endline separators, which are inappropriate for the default use of read_json.

Example

Let’s look at an example where we have a JSON containing information about pizzas. The JSON file looks as follows:

{"pizza":"margherita", "price":7.99, "Details":"Contains cheese.\nSuitable for vegetarians"}
{"pizza":"pepperoni", "price":9.99, "Details":"Contains meat.\nNot suitable for vegetarians"}
{"pizza":"marinara", "price":6.99, "Details":"Dairy free.\nSuitable for vegetarians."}
{"pizza":"four cheese", "price":10.99, "Details":"Contains cheese.\nSuitable for vegetarians"}
{"pizza":"hawaiian", "price":9.99, "Details":"Contains meat.\nNot suitable for vegetarians"}

We can import the JSON file into pandas DataFrame using the read_json method. Let’s look at the code:

import pandas as pd

df = pd.read_json('sample.json')

print(df)

Let’s run the code to see what happens:

ValueError: Trailing data

We raise the ValueError because the Details item in the JSON file contains \n to indicate a new line.

Solution

We can solve this error by setting lines=True when calling the read_json method to read the file as a JSON object per line. Let’s look at the revised code:

import pandas as pd

df = pd.read_json('sample.json', lines=True)

print(df)

Let’s run the code to see the result:

         pizza  price                                       Details
0   margherita   7.99    Contains cheese.\nSuitable for vegetarians
1    pepperoni   9.99  Contains meat.\nNot suitable for vegetarians
2     marinara   6.99        Dairy free.\nSuitable for vegetarians.
3  four cheese  10.99    Contains cheese.\nSuitable for vegetarians
4     hawaiian   9.99  Contains meat.\nNot suitable for vegetarians

We can remove the endline separators using str.replace. Let’s look at the code:

df['Details'] = df['Details'].str.replace('\n', ' ')

print(df)

Let’s run the code to remove the endline separators:

         pizza  price                                      Details
0   margherita   7.99    Contains cheese. Suitable for vegetarians
1    pepperoni   9.99  Contains meat. Not suitable for vegetarians
2     marinara   6.99        Dairy free. Suitable for vegetarians.
3  four cheese  10.99    Contains cheese. Suitable for vegetarians
4     hawaiian   9.99  Contains meat. Not suitable for vegetarians

Summary

Congratulations on reading to the end of this tutorial!

For further reading on Pandas, go to the article: Introduction to Pandas: A Complete Tutorial for Beginners.

For further reading on errors involving JSON, go to the articles:

Have fun and happy researching!