Introduction
Bucket sort is an efficient sorting algorithm that distributes elements into several “buckets” and then sorts each bucket individually. It is handy when input values are uniformly distributed over a range, as this can lead to a time complexity close to linear. In this blog post, we’ll explain how bucket sort works, provide a Python implementation, and discuss its advantages and limitations.
What is Bucket Sort?
Bucket sort is a non-comparison-based sorting algorithm. It divides the input array into smaller, evenly distributed groups (called buckets) and sorts each bucket individually. After sorting, the elements from each bucket are concatenated back to form the sorted array.
The key idea is to map input elements to buckets so that elements within each bucket are relatively close in value. Sorting smaller buckets is easier, making bucket sort efficient for certain types of data, such as numbers that are uniformly distributed over a range.
How Bucket Sort Works
- Create buckets based on the range of input values.
- Distribute elements of the input array into buckets based on their value.
- Sort each bucket using a simple sorting algorithm (like insertion sort).
- Concatenate the sorted buckets to form the final sorted array.
Time Complexity
- Best Case: O(n + k), where
n
is the number of elements andk
is the number of buckets. - Average Case: O(n + k + n log n) (depends on how the individual buckets are sorted).
- Worst Case: O(n²) (when all elements fall into one bucket, degrading the sorting efficiency).
Space Complexity
- O(n + k), where
n
is the number of elements andk
is the number of buckets.
Bucket Sort Algorithm in Python
Here’s a Python implementation of bucket sort using insertion sort to sort the elements in each bucket:
# Helper function: Insertion sort to sort individual buckets def insertion_sort(bucket): for i in range(1, len(bucket)): key = bucket[i] j = i - 1 while j >= 0 and key < bucket[j]: bucket[j + 1] = bucket[j] j -= 1 bucket[j + 1] = key # Main bucket sort function def bucket_sort(arr): if len(arr) == 0: return arr # Create empty buckets num_buckets = len(arr) buckets = [[] for _ in range(num_buckets)] # Find the maximum value in the array to determine bucket placement max_value = max(arr) # Distribute elements into the corresponding buckets for num in arr: # Normalize the element into the bucket range index = int(num_buckets * num / (max_value + 1)) buckets[index].append(num) # Sort each bucket individually using insertion sort for bucket in buckets: insertion_sort(bucket) # Concatenate the sorted buckets into the final sorted array sorted_arr = [] for bucket in buckets: sorted_arr.extend(bucket) return sorted_arr # Example usage: data = [0.42, 0.32, 0.23, 0.52, 0.25, 0.47, 0.51] sorted_data = bucket_sort(data) print("Sorted array:", sorted_data)
Output:
Sorted array: [0.23, 0.25, 0.32, 0.42, 0.47, 0.51, 0.52]
Detailed Step-by-Step Explanation of Bucket Sort
Let’s break down the sorting process for the array [0.42, 0.32, 0.23, 0.52, 0.25, 0.47, 0.51]
.
Step 1: Create Buckets
The first step is to create an array of empty buckets. The number of buckets is usually the same as the number of elements in the input array. In this case, since we have 7 elements, we create 7 empty buckets.
Initial empty buckets:
[[], [], [], [], [], [], []]
Step 2: Distribute Elements into Buckets
Each element from the input array is placed into a specific bucket based on its value. To determine which bucket an element belongs to, we use the following formula:
index = int(num_buckets * num / (max_value + 1))
num_buckets
is the total number of buckets (equal to the number of elements in the array).num
is the value of the element from the input array.max_value + 1
ensures that the bucket placement is normalized based on the maximum value in the array.
Example of bucket placement:
- The value
0.42
goes into bucket2
becauseint(7 * 0.42 / (0.52 + 1)) = 2
. - The value
0.32
goes into bucket2
becauseint(7 * 0.32 / (0.52 + 1)) = 2
. - The value
0.23
goes into bucket1
becauseint(7 * 0.23 / (0.52 + 1)) = 1
. - The value
0.52
goes into bucket3
becauseint(7 * 0.52 / (0.52 + 1)) = 3
.
Buckets after distribution:
[[], [0.23, 0.25], [0.42, 0.32], [0.52, 0.47, 0.51], []]
Step 3: Sort Each Bucket Individually
After the elements have been distributed into the buckets, each bucket is sorted individually using insertion sort. Insertion sort is an efficient choice for sorting small arrays like these buckets.
Sorting individual buckets:
- Bucket
[0.23, 0.25]
is already sorted. - Bucket
[0.42, 0.32]
is sorted to become[0.32, 0.42]
. - Bucket
[0.52, 0.47, 0.51]
is sorted to become[0.47, 0.51, 0.52]
.
Buckets after sorting:
[[], [0.23, 0.25], [0.32, 0.42], [0.47, 0.51, 0.52], []]
Step 4: Concatenate Sorted Buckets
Finally, the sorted buckets are concatenated into one sorted array. The elements are appended from each bucket in order, resulting in the final sorted array.
Final sorted array:
[0.23, 0.25, 0.32, 0.42, 0.47, 0.51, 0.52]
Advantages of Bucket Sort
- Efficient for Uniformly Distributed Data: Bucket sort performs very well when the data is uniformly distributed over a known range. In such cases, it can achieve near-linear time complexity O(n + k), making it faster than comparison-based algorithms like quicksort.
- Parallelizable: Bucket sort can be parallelized easily since the sorting of individual buckets is independent. Each bucket can be sorted in parallel, improving performance on multi-core systems.
- Handles Floating Point Numbers: Unlike some sorting algorithms that are primarily designed for integers, bucket sort can efficiently handle floating-point numbers.
Examples of algorithms typically designed for integers:
- Counting Sort: Operates by counting the occurrences of each unique integer, making it unsuitable for floats or strings without adaptation.
- Radix Sort: Sorts integers by processing digits individually, making it best for fixed-length integers (although it can be adapted for strings or floats).
- Pigeonhole Sort: Specifically designed for integers within a fixed range, it places each element in a corresponding pigeonhole based on its value.
Limitations of Bucket Sort
- Inefficient with Skewed Data: If the input data is not uniformly distributed, bucket sort can degrade in performance. For instance, if all elements fall into the same bucket, the algorithm will behave like an inefficient O(n²) sorting algorithm.
- Choosing the Number of Buckets: The number of buckets needs to be chosen carefully. Too few buckets can cause inefficiency, while too many buckets can lead to wasted space and increased overhead.
- Requires Extra Space: Bucket sort requires extra space for storing the buckets, leading to a space complexity of O(n + k). This may not be ideal for applications where memory usage is a concern.
Conclusion
Bucket sort is an efficient algorithm for sorting uniformly distributed data over a known range. By distributing elements into buckets, sorting each bucket individually, and concatenating the results, bucket sort can achieve near-linear time complexity. However, to ensure optimal performance, it is essential to carefully consider the number of buckets and the input data distribution.
For further reading on sorting algorithms in Python, go to the articles:
- How to do Insertion Sort in Python
- How to Do Bubble Sort in Python
- How to do Quick Sort in Python
- How to Do Selection Sort in Python
- How to do Counting Sort in Python
- How to Do Radix Sort in Python
- How to Do Heap Sort in Python
- How to Do Pigeonhole Sort in Python
- How To Do Comb Sort in Python
- How to Do Shell Sort in Python
- How To Do TimSort in Python
Go to the online courses page on Python to learn more about Python for data science and machine learning.
Have fun and happy researching!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.