Pigeonhole sort is a simple sorting algorithm that is efficient for sorting lists where the number of elements and the range of possible key values are approximately the same. This algorithm distributes elements into “pigeonholes” based on their key values. It’s ideal when you know the range of the input values in advance and when those values are integers. In this blog post, we’ll review how pigeonhole sort works, provide a Python implementation, and discuss when this algorithm is practical and what alternatives may be more suitable.
What is Pigeonhole Sort?
Pigeonhole sort is a non-comparison-based sorting algorithm that places each element into a “hole” or “bucket” based on its value. The range of the keys (input values) determines the number of pigeonholes. Once all the elements are distributed into their pigeonholes, they are collected to get a sorted array.
This algorithm works best when:
- The number of elements (n) is roughly equal to the range of key values (k).
- The elements are integers, or can be mapped to integers in a small range.
How Pigeonhole Sort Works:
- Find the minimum and maximum values in the input array to determine the range.
- Create pigeonholes (or buckets) for each value within this range.
- Place each element from the input array into its corresponding pigeonhole.
- Collect elements from the pigeonholes in order, giving a sorted array.
Time Complexity:
- Best Case: O(n + k), where
n
is the number of elements andk
is the range of possible key values. - Worst Case: O(n + k) (The algorithm performs consistently regardless of input order).
Space Complexity:
- O(n + k) due to the extra space needed for pigeonholes.
Pigeonhole Sort Algorithm in Python
Let’s implement pigeonhole sort in Python:
def pigeonhole_sort(arr): # Step 1: Find the minimum and maximum values min_value = min(arr) max_value = max(arr) # Calculate the range of the values size = max_value - min_value + 1 # Step 2: Create pigeonholes (empty list of lists) holes = [[] for _ in range(size)] # Step 3: Place elements into their corresponding pigeonholes for num in arr: holes[num - min_value].append(num) # Step 4: Collect the sorted elements from the pigeonholes sorted_array = [] for hole in holes: sorted_array.extend(hole) return sorted_array # Example usage: data = [8, 3, 2, 7, 4, 6, 8] sorted_data = pigeonhole_sort(data) print("Sorted array:", sorted_data)
Output:
Sorted array: [2, 3, 4, 6, 7, 8, 8]
Step-by-Step Explanation of Pigeonhole Sort
Let’s break down how pigeonhole sort works for the array [8, 3, 2, 7, 4, 6, 8]
.
Step 1: Find the Minimum and Maximum Values
- First, find the minimum and maximum values in the input array. This allows us to determine how many pigeonholes we need.
For the array [8, 3, 2, 7, 4, 6, 8]
:
- Min value: 2
- Max value: 8
- Range:
8 - 2 + 1 = 7
Step 2: Create Pigeonholes
- Create an array of empty pigeonholes. In this case, we need 7 pigeonholes (one for each value from 2 to 8).
Initial pigeonholes:
[[], [], [], [], [], [], []] # Representing holes for values 2 through 8
Step 3: Place Elements in Pigeonholes
- Each element from the input array is placed into the corresponding pigeonhole based on its value. The index of the pigeonhole is determined by the formula
num - min_value
.
For example:
8
goes into pigeonhole at index8 - 2 = 6
,3
goes into pigeonhole at index3 - 2 = 1
, and so on.
Pigeonholes after placing elements:
[[2], [3], [4], [], [6], [7], [8, 8]]
Step 4: Collect the Sorted Elements
- Finally, collect the elements from the pigeonholes in order. The order of elements in each pigeonhole is already correct, so no further sorting is needed.
Sorted array:
[2, 3, 4, 6, 7, 8, 8]
When to Use Pigeonhole Sort
Pigeonhole sort can be extremely efficient in specific scenarios, but it is not universally applicable. Here are the key conditions under which pigeonhole sort is a good choice:
- Small Range of Integer Values: Pigeonhole sort works best when the range of values (k) is close to the number of elements (n). For example, pigeonhole sort would work well if you need to sort numbers between 1 and 100 and there are 100 numbers to sort.
- Uniform Distribution: If the data is uniformly distributed across the range, pigeonhole sort can take advantage of this regularity to place elements directly into their “correct” pigeonhole, making it faster than comparison-based algorithms.
- Simple Implementation: Pigeonhole sort is easy to understand and implement, which makes it a great option for quick, simple sorting tasks when the input meets its criteria.
However, in cases where pigeonhole sort is not appropriate, there are other algorithms that may work better. Let’s explore some alternatives based on specific scenarios.
Alternatives to Pigeonhole Sort
1. Counting Sort
- Best for: Sorting integers when the range of possible values (k) is small relative to the number of elements (n).
- Time Complexity: O(n + k), similar to pigeonhole sort.
- Key Difference: Instead of creating an array of lists (pigeonholes), counting sort uses a single array to store the frequency of each value and then constructs the sorted array directly from those counts. Counting sort is often more space-efficient when the input has many repeated values.
When to use counting sort:
- When the dataset contains many repeated values (e.g., sorting exam scores).
- When the input data is dense (few gaps between values).
- Example: Sorting an array of integers between 1 and 100 with many repeated values like
[50, 50, 50, 51, 51]
.
2. Bucket Sort
- Best for: Sorting floating-point numbers or integers with a known range.
- Time Complexity: O(n + k), but often slower than pigeonhole sort due to the additional sorting step for each bucket.
- Key Difference: Like pigeonhole sort, bucket sort distributes elements into buckets based on their values, but each bucket is then sorted individually, typically using a simple sorting algorithm like insertion sort.
When to use bucket sort:
- When the range of values is large or the data is sparse (many gaps between values).
- When sorting floating-point numbers where pigeonhole sort isn’t ideal.
- Example: Sorting floating-point numbers like
[0.22, 0.89, 0.34, 0.76, 0.49]
or sparse integer data like[1, 1000, 2000]
.
3. Radix Sort
- Best for: Sorting integers or strings when the range of values is large but can be processed digit by digit or character by character.
- Time Complexity: O(d(n + k)), where
d
is the number of digits or characters andk
is the size of the input range. - Key Difference: Radix sort processes each digit or character of the input data, from the least significant to the most significant, using a stable sorting algorithm like counting sort or bucket sort for each step.
When to use radix sort:
- When sorting large numbers, long strings, or structured data (like dates or IP addresses) that can be processed one part at a time.
- Example: Sorting zip codes, dates, or large numbers like
[123, 987, 654, 321]
.
4. Quick Sort
- Best for: General-purpose sorting when the input is large, and the range of values is unknown.
- Time Complexity: O(n log n) on average, O(n²) in the worst case (but can be optimized with good pivot selection).
- Key Difference: Quicksort is a comparison-based algorithm that divides the input into smaller sub-arrays based on a pivot element and recursively sorts those sub-arrays.
When to use quicksort:
- When you don’t know the range of the input or when the input is a large, unsorted array.
- Example: Sorting general data like
[12, 4, 7, 9, 3, 8, 1]
.
Limitations of Pigeonhole Sort
- Inefficient for Large Ranges: Pigeonhole sort becomes inefficient when the range of values (k) is much larger than the number of elements (n). For example, sorting
[1, 1000]
would require 1000 pigeonholes even though only 2 elements exist. - Space Complexity: Pigeonhole sort requires extra space for pigeonholes, leading to O(n + k) space complexity. This can be problematic when k is very large, as it may require more memory than the input array.
- Limited to Integers or Mapped Integers: Pigeonhole sort works best for integers or values that can be mapped to integers. It’s unsuitable for floating-point numbers or complex data types unless they can be mapped to a small range of integers.
Conclusion
Pigeonhole sort is a simple and efficient algorithm for integer input data and a small range of values. It works well for sorting uniformly distributed data within a known range. However, it can be inefficient when the range is too large or space is a concern. In such cases, alternatives like counting sort, bucket sort, radix sort, or quick sort better suit your needs.
Congratulations on reading to the end of this tutorial!
Read the following articles to learn how to implement Pigeonhole Sort:
In JavaScript – How to do Pigeonhole Sort in JavaScript
In C++ – How To Do Pigeonhole Sort in C++
Have fun and happy researching!
For further reading on sorting algorithms in Python, go to the articles:
- How to do Insertion Sort in Python
- How to Do Bubble Sort in Python
- How to Do Selection Sort in Python
- How to Do Bucket Sort in Python
Have fun and happy researching!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.