How to Solve Python TypeError: DictProxy object is not JSON serializable

by Suf | Programming, Python, Tips

When working with Python’s multiprocessing or manager module, you may encounter the error: TypeError: 'DictProxy' object is not JSON serializable. This issue arises when attempting to serialise a DictProxy object (a special object created by multiprocessing.Manager to allow shared access to data between processes) using the json module.

In addition to the TypeError, users may also face an AttributeError related to pickling functions when using multiprocessing, especially in interactive environments such as Jupyter or IPython notebooks. This post explores why both of these errors occur and how to fix them. We’ll provide an example that reproduces the error and show step-by-step solutions to resolve it.

Why the Errors Occur

AttributeError (Can’t get attribute ‘worker’ on <module>): This error happens when Python’s multiprocessing module tries to “pickle” a function (like worker) for use in a child process. If the function is not defined in the main module (__main__), the child process cannot find it, leading to this AttributeError. This is common in environments like Jupyter notebooks or IPython, where the multiprocessing context is not correctly handled.

TypeError (DictProxy not JSON serializable): Python’s json module only serializes basic data types like dict, list, str, etc. A DictProxy object, which enables shared access to data between processes, is not a standard dict. Thus, attempting to serialize it with json.dumps() raises the TypeError.

Example Reproducing Both Errors

Below is an example that reproduces both the TypeError and the AttributeError.

import multiprocessing
import json

def worker(shared_dict):
    shared_dict['key'] = 'value'

if __name__ == '__main__':
    manager = multiprocessing.Manager()
    shared_dict = manager.dict()  # DictProxy object

    # Start a new process to modify the shared dict
    p = multiprocessing.Process(target=worker, args=(shared_dict,))
    p.start()
    p.join()

    # Trying to serialize the DictProxy object
    try:
        json_data = json.dumps(shared_dict)
    except TypeError as e:
        print(f"Error: {e}")

Output:

Error: Object of type DictProxy is not JSON serializable

In addition, if you run this in a Jupyter notebook, you may also encounter:

AttributeError: Can't get attribute 'worker' on <module '__main__' (<class '_frozen_importlib.BuiltinImporter'>)>

Solution 1: Convert to Regular `dict`

To solve the TypeError, convert the DictProxy object to a regular dict before attempting to serialize it. This can be done easily by passing the DictProxy object into Python’s dict() constructor.

Here’s how you can fix it:

import multiprocessing
import json

def worker(shared_dict):
    shared_dict['key'] = 'value'

if __name__ == '__main__':
    manager = multiprocessing.Manager()
    shared_dict = manager.dict()  # DictProxy object

    # Start a new process to modify the shared dict
    p = multiprocessing.Process(target=worker, args=(shared_dict,))
    p.start()
    p.join()

    # Convert the DictProxy to a regular dict before serializing
    json_data = json.dumps(dict(shared_dict))
    print(f"Serialized data: {json_data}")

Output:

Serialized data: {"key": "value"}

In this updated code, you can successfully serialise the DictProxy object by converting it into a standard Python dict using dict(shared_dict).

Solution 2: Fixing `AttributeError` in Interactive Environments

If you are running the code in an interactive environment like Jupyter or IPython, the multiprocessing module may not behave as expected, causing an AttributeError. This is because multiprocessing requires functions to be defined in the __main__ scope to properly pickle them for use in subprocesses.

To fix this error, ensure that all multiprocessing-related code is placed inside an if __name__ == '__main__': block. Here is the corrected code:

import multiprocessing
import json

def worker(shared_dict):
    shared_dict['key'] = 'value'

if __name__ == '__main__':
    manager = multiprocessing.Manager()
    shared_dict = manager.dict()

    # Start a new process to modify the shared dict
    p = multiprocessing.Process(target=worker, args=(shared_dict,))
    p.start()
    p.join()

    # Convert the DictProxy to a regular dict before serializing
    json_data = json.dumps(dict(shared_dict))
    print(f"Serialized data: {json_data}")

By placing the multiprocessing code inside if __name__ == '__main__':, the script ensures that child processes can correctly locate the worker function, avoiding the AttributeError.

Solution 3: Using `concurrent.futures.ProcessPoolExecutor`

If you prefer to avoid the complexities of multiprocessing in environments like Jupyter notebooks, you can use concurrent.futures.ProcessPoolExecutor, which handles multiprocessing more gracefully. Here’s how you can refactor the code:

from concurrent.futures import ProcessPoolExecutor
import json

def worker(shared_dict):
    shared_dict['key'] = 'value'
    return shared_dict

if __name__ == '__main__':
    with ProcessPoolExecutor() as executor:
        shared_dict = {}
        future = executor.submit(worker, shared_dict)
        shared_dict = future.result()

        # Serialize the dictionary
        json_data = json.dumps(shared_dict)
        print(f"Serialized data: {json_data}")

Serialized data: {"key": "value"}

ProcessPoolExecutor simplifies multiprocessing, especially in interactive environments, by internally handling some pickling and process management.

Conclusion

The TypeError: 'DictProxy' object is not JSON serializable occurs when trying to serialize a DictProxy object from Python’s multiprocessing module. The solution is to convert it to a regular dictionary using dict() before serialization. Additionally, in interactive environments like Jupyter, using multiprocessing may cause an AttributeError. To fix this, place your multiprocessing code inside an if __name__ == '__main__': block or use ProcessPoolExecutor for more robust multiprocessing handling.

By understanding both the TypeError and AttributeError, you can efficiently resolve these issues in your multiprocessing code.

Congratulations on reading to the end of this tutorial!

For further reading on related TypeErrors … is not JSON serializable see:

How to Solve Python TypeError: Object of type Response is not JSON serializable

To learn more about Python for data science and machine learning, go to the online courses page on Python for the most comprehensive courses available.

Have fun and happy researching!

Suf

Senior Advisor, Data Science | [email protected] | + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.

Buy Me a Coffee