How to Do Bucket Sort in Python

by | DSA, Programming, Python, Tips

Introduction

Bucket sort is an efficient sorting algorithm that distributes elements into several “buckets” and then sorts each bucket individually. It is handy when input values are uniformly distributed over a range, as this can lead to a time complexity close to linear. In this blog post, we’ll explain how bucket sort works, provide a Python implementation, and discuss its advantages and limitations.


What is Bucket Sort?

Bucket sort is a non-comparison-based sorting algorithm. It divides the input array into smaller, evenly distributed groups (called buckets) and sorts each bucket individually. After sorting, the elements from each bucket are concatenated back to form the sorted array.

The key idea is to map input elements to buckets so that elements within each bucket are relatively close in value. Sorting smaller buckets is easier, making bucket sort efficient for certain types of data, such as numbers that are uniformly distributed over a range.

How Bucket Sort Works

  1. Create buckets based on the range of input values.
  2. Distribute elements of the input array into buckets based on their value.
  3. Sort each bucket using a simple sorting algorithm (like insertion sort).
  4. Concatenate the sorted buckets to form the final sorted array.

Time Complexity

  • Best Case: O(n + k), where n is the number of elements and k is the number of buckets.
  • Average Case: O(n + k + n log n) (depends on how the individual buckets are sorted).
  • Worst Case: O(n²) (when all elements fall into one bucket, degrading the sorting efficiency).

Space Complexity

  • O(n + k), where n is the number of elements and k is the number of buckets.

Bucket Sort Algorithm in Python

Here’s a Python implementation of bucket sort using insertion sort to sort the elements in each bucket:

# Helper function: Insertion sort to sort individual buckets
def insertion_sort(bucket):
    for i in range(1, len(bucket)):
        key = bucket[i]
        j = i - 1
        while j >= 0 and key < bucket[j]:
            bucket[j + 1] = bucket[j]
            j -= 1
        bucket[j + 1] = key

# Main bucket sort function
def bucket_sort(arr):
    if len(arr) == 0:
        return arr

    # Create empty buckets
    num_buckets = len(arr)
    buckets = [[] for _ in range(num_buckets)]

    # Find the maximum value in the array to determine bucket placement
    max_value = max(arr)

    # Distribute elements into the corresponding buckets
    for num in arr:
        # Normalize the element into the bucket range
        index = int(num_buckets * num / (max_value + 1))
        buckets[index].append(num)

    # Sort each bucket individually using insertion sort
    for bucket in buckets:
        insertion_sort(bucket)

    # Concatenate the sorted buckets into the final sorted array
    sorted_arr = []
    for bucket in buckets:
        sorted_arr.extend(bucket)

    return sorted_arr

# Example usage:
data = [0.42, 0.32, 0.23, 0.52, 0.25, 0.47, 0.51]
sorted_data = bucket_sort(data)
print("Sorted array:", sorted_data)

Output:

Sorted array: [0.23, 0.25, 0.32, 0.42, 0.47, 0.51, 0.52]

Detailed Step-by-Step Explanation of Bucket Sort

Let’s break down the sorting process for the array [0.42, 0.32, 0.23, 0.52, 0.25, 0.47, 0.51].

Step 1: Create Buckets

The first step is to create an array of empty buckets. The number of buckets is usually the same as the number of elements in the input array. In this case, since we have 7 elements, we create 7 empty buckets.

Initial empty buckets:

[[], [], [], [], [], [], []]

Step 2: Distribute Elements into Buckets

Each element from the input array is placed into a specific bucket based on its value. To determine which bucket an element belongs to, we use the following formula:

index = int(num_buckets * num / (max_value + 1))
  • num_buckets is the total number of buckets (equal to the number of elements in the array).
  • num is the value of the element from the input array.
  • max_value + 1 ensures that the bucket placement is normalized based on the maximum value in the array.

Example of bucket placement:

  • The value 0.42 goes into bucket 2 because int(7 * 0.42 / (0.52 + 1)) = 2.
  • The value 0.32 goes into bucket 2 because int(7 * 0.32 / (0.52 + 1)) = 2.
  • The value 0.23 goes into bucket 1 because int(7 * 0.23 / (0.52 + 1)) = 1.
  • The value 0.52 goes into bucket 3 because int(7 * 0.52 / (0.52 + 1)) = 3.

Buckets after distribution:

[[], [0.23, 0.25], [0.42, 0.32], [0.52, 0.47, 0.51], []]

Step 3: Sort Each Bucket Individually

After the elements have been distributed into the buckets, each bucket is sorted individually using insertion sort. Insertion sort is an efficient choice for sorting small arrays like these buckets.

Sorting individual buckets:

  • Bucket [0.23, 0.25] is already sorted.
  • Bucket [0.42, 0.32] is sorted to become [0.32, 0.42].
  • Bucket [0.52, 0.47, 0.51] is sorted to become [0.47, 0.51, 0.52].

Buckets after sorting:

[[], [0.23, 0.25], [0.32, 0.42], [0.47, 0.51, 0.52], []]

Step 4: Concatenate Sorted Buckets

Finally, the sorted buckets are concatenated into one sorted array. The elements are appended from each bucket in order, resulting in the final sorted array.

Final sorted array:

[0.23, 0.25, 0.32, 0.42, 0.47, 0.51, 0.52]

Advantages of Bucket Sort

  1. Efficient for Uniformly Distributed Data: Bucket sort performs very well when the data is uniformly distributed over a known range. In such cases, it can achieve near-linear time complexity O(n + k), making it faster than comparison-based algorithms like quicksort.
  2. Parallelizable: Bucket sort can be parallelized easily since the sorting of individual buckets is independent. Each bucket can be sorted in parallel, improving performance on multi-core systems.
  3. Handles Floating Point Numbers: Unlike some sorting algorithms that are primarily designed for integers, bucket sort can efficiently handle floating-point numbers.

Examples of algorithms typically designed for integers:

  • Counting Sort: Operates by counting the occurrences of each unique integer, making it unsuitable for floats or strings without adaptation.
  • Radix Sort: Sorts integers by processing digits individually, making it best for fixed-length integers (although it can be adapted for strings or floats).
  • Pigeonhole Sort: Specifically designed for integers within a fixed range, it places each element in a corresponding pigeonhole based on its value.

Limitations of Bucket Sort

  1. Inefficient with Skewed Data: If the input data is not uniformly distributed, bucket sort can degrade in performance. For instance, if all elements fall into the same bucket, the algorithm will behave like an inefficient O(n²) sorting algorithm.
  2. Choosing the Number of Buckets: The number of buckets needs to be chosen carefully. Too few buckets can cause inefficiency, while too many buckets can lead to wasted space and increased overhead.
  3. Requires Extra Space: Bucket sort requires extra space for storing the buckets, leading to a space complexity of O(n + k). This may not be ideal for applications where memory usage is a concern.

Conclusion

Bucket sort is an efficient algorithm for sorting uniformly distributed data over a known range. By distributing elements into buckets, sorting each bucket individually, and concatenating the results, bucket sort can achieve near-linear time complexity. However, to ensure optimal performance, it is essential to carefully consider the number of buckets and the input data distribution.

For further reading on sorting algorithms in Python, go to the articles:

Go to the online courses page on Python to learn more about Python for data  science and machine learning.

Have fun and happy researching!

Profile Picture
Research Scientist at Moogsoft | + posts

Suf is a research scientist at Moogsoft, specializing in Natural Language Processing and Complex Networks. Previously he was a Postdoctoral Research Fellow in Data Science working on adaptations of cutting-edge physics analysis techniques to data-intensive problems in industry. In another life, he was an experimental particle physicist working on the ATLAS Experiment of the Large Hadron Collider. His passion is to share his experience as an academic moving into industry while continuing to pursue research. Find out more about the creator of the Research Scientist Pod here and sign up to the mailing list here!