Heap Sort is an efficient, comparison-based sorting algorithm that uses a binary heap data structure to organize and sort elements. It is mainly known for its ability to sort data in O(n log n) time and in-place sorting nature, meaning it doesn’t require additional storage like Merge Sort. This blog post will walk through Heap Sort, provide a Rust implementation, and compare its performance with other sorting algorithms like Bubble Sort and Quick Sort.
What is Heap Sort?
Heap Sort is a sorting algorithm that builds a binary heap from the input data, then repeatedly extracts the maximum (or minimum) element from the heap and places it at the end of the array. A binary heap is a complete binary tree that satisfies the heap property, where each parent node is greater than or equal to its child nodes (for a max-heap).
The steps in Heap Sort are:
- Build a max-heap from the unsorted data.
- Extract the root (maximum element) and move it to the end of the array.
- Heapify the remaining array to restore the heap property, then repeat the process.
Time Complexity of Heap Sort
- Best, Average, and Worst Case: Heap Sort has a time complexity of O(n log n) in all cases. Building the heap takes O(n) time, and heapifying each extracted element takes O(log n).
- Unlike Quick Sort, which can degrade to O(n²) in the worst case, Heap Sort’s performance is consistent and predictable.
Pseudocode of Heap Sort
Here’s a high-level pseudocode for Heap Sort:
function heapSort(arr): buildMaxHeap(arr) for i from length(arr) - 1 down to 1: swap(arr[0], arr[i]) heapify(arr, 0, i) function buildMaxHeap(arr): for i from floor(length(arr) / 2) down to 0: heapify(arr, i, length(arr)) function heapify(arr, i, heapSize): left = 2 * i + 1 right = 2 * i + 2 largest = i if left < heapSize and arr[left] > arr[largest]: largest = left if right < heapSize and arr[right] > arr[largest]: largest = right if largest != i: swap(arr[i], arr[largest]) heapify(arr, largest, heapSize)
Step-by-Step Explanation of the Pseudocode
- BuildMaxHeap: This function converts the input array into a max-heap, ensuring that the largest element is at the root.
- Heapify: Heapify is a recursive function that maintains the heap property by comparing the current node with its children and swapping them if necessary.
- HeapSort: After the heap is built, HeapSort repeatedly swaps the root (largest element) with the last element in the array, reduces the heap size, and calls heapify to restore the heap property.
Rust Implementation of Heap Sort
Here’s how you can implement Heap Sort in Rust. Rust’s ownership model, borrow checker, and mutability rules make it a bit different from other languages, so we’ll need to be careful with references and in-place operations.
Rust Code
fn heap_sort<T: Ord>(arr: &mut [T]) { let n = arr.len(); // Build max heap for i in (0..n / 2).rev() { heapify(arr, n, i); } // Extract elements from heap one by one for i in (1..n).rev() { arr.swap(0, i); heapify(arr, i, 0); } } fn heapify<T: Ord>(arr: &mut [T], heap_size: usize, root: usize) { let mut largest = root; let left = 2 * root + 1; let right = 2 * root + 2; if left < heap_size && arr[left] > arr[largest] { largest = left; } if right < heap_size && arr[right] > arr[largest] { largest = right; } if largest != root { arr.swap(root, largest); heapify(arr, heap_size, largest); } } fn main() { let mut numbers = vec![12, 11, 13, 5, 6, 7]; println!("Before sorting: {:?}", numbers); heap_sort(&mut numbers); println!("After sorting: {:?}", numbers); }
Rust-Specific Details:
- Mutable Slices: In Rust, sorting is done in-place by passing a mutable slice
&mut [T]
to the sorting function. - Zero-Based Indexing: The child indices are calculated as
2 * root + 1
and2 * root + 2
, following the zero-based indexing used by Rust arrays. - Rev() for Reversed Iteration: The
rev()
method is used to iterate from the end of the array, which is crucial for building the heap from the bottom up.
Step-by-Step Explanation of Example with Key Points
Let’s sort the array [12, 11, 13, 5, 6, 7]
using Heap Sort. The process consists of two main phases: building the max heap and sorting the heap.
1. Building the Max Heap
Heap Sort begins by constructing a max heap from the input array. A max heap is a binary tree where the parent node is always greater than or equal to its child nodes. The key is to start heapifying from the non-leaf nodes (found in the first half of the array) and move upward toward the root.
- Step 1.1: Start heapifying from index
n/2 - 1
, which is the last non-leaf node.- For the array
[12, 11, 13, 5, 6, 7]
, the sizen
is 6, so the last non-leaf node is at the index2
(i.e.,n/2 - 1 = 6/2 - 1 = 2
).
- For the array
- Step 1.2: Apply the
heapify
function on the node at index 2.- The value at index 2 is
13
already larger than both its children, so no changes are needed.
- The value at index 2 is
- Step 1.3: Move to the node at index 1 (which contains
11
).- The children of index 1 are
6
(index 4) and7
(index 5). Since the value at index 1 (11
) is greater than both of its children, no changes are required here either.
- The children of index 1 are
- Step 1.4: Finally, heapify the root node (index 0, value
12
).- The children of index 0 are
11
(index 1) and13
(index 2). The value13
is larger than12
, so we swap12
and13
. - The array now looks like
[13, 11, 12, 5, 6, 7]
.
- The children of index 0 are
At this point, we have successfully built a max heap, where the largest element (13
) is at the root of the array.
2. Heap Sort (Extracting Elements)
After building the max heap, the sorting process begins by repeatedly extracting the root (the largest element) and placing it at the end of the array. The size of the heap is then reduced, and the heap is restructured to maintain the max heap property.
- Step 2.1: Swap the root (
13
) with the last element (7
).- After the swap, the array becomes
[7, 11, 12, 5, 6, 13]
. - Now, apply
heapify
to the reduced heap (sizen - 1 = 5
).
- After the swap, the array becomes
- Step 2.2: Heapify the root element (
7
).- The children of
7
are11
(index 1) and12
(index 2). Since12
is the largest, we swap7
and12
. - The array now becomes
[12, 11, 7, 5, 6, 13]
.
- The children of
- Step 2.3: Move to the next iteration and swap the new root (
12
) with the second-to-last element (6
).- After the swap, the array looks like
[6, 11, 7, 5, 12, 13]
. - Heapify the root again with a reduced heap size of
4
.
- After the swap, the array looks like
- Step 2.4: Heapify the root (
6
).- The children of
6
are11
(index 1) and7
(index 2). We swap6
with the larger child,11
. - The array becomes
[11, 6, 7, 5, 12, 13]
.
- The children of
- Step 2.5: Repeat the process:
- Swap the root (
11
) with5
, resulting in[5, 6, 7, 11, 12, 13]
. - Heapify the root (
5
), which swaps5
with7
, producing[7, 6, 5, 11, 12, 13]
.
- Swap the root (
- Step 2.6: At this point, only three elements remain to be sorted.
- Swap the root (
7
) with5
, resulting in[5, 6, 7, 11, 12, 13]
. - Heapify the root (
5
), swapping5
and6
, giving us[6, 5, 7, 11, 12, 13]
.
- Swap the root (
- Step 2.7: Swap the last two elements, resulting in the final sorted array:
[5, 6, 7, 11, 12, 13]
.
Key Points:
- Heap Construction: We built the max heap by heapifying nodes, starting from the last non-leaf node and moving up to the root.
- Sorting: We repeatedly swapped the root with the last unsorted element and restructured the remaining heap using
heapify
. - In-Place Sorting: The sorting happens in-place, meaning no additional arrays were needed, making Heap Sort memory efficient.
By the end, we have a fully sorted array, [5, 6, 7, 11, 12, 13]
.
Performance Test: Heap Sort vs Merge Sort vs Quick Sort
To compare Heap Sort’s performance with other standard sorting algorithms, we’ll benchmark it against Merge Sort and Quick Sort (original and Median of Three optimized) using arrays of different sizes (random, sorted, and reverse-sorted). The median-of-three optimization is applied to Quick Sort to improve its performance on sorted and reverse-sorted arrays by selecting a better pivot.
We’ll use Rust’s std::time::Instant
for benchmarking and for generating arrays, we can utilize the rand
crate to create random arrays and manual loops to generate sorted and reverse-sorted arrays.
When performing your tests, include rand
in the under the [dependencies]
section of your Cargo.toml
file, for example:
[package] name = "insertion_sort" version = "0.1.0" edition = "2021" [dependencies] rand = "0.9.0-alpha.2"
Here’s the Rust code to run these tests:
use rand::seq::SliceRandom; use std::time::Instant; fn merge_sort<T: Ord + Copy>(arr: &mut [T]) { let len = arr.len(); if len <= 1 { return; } let mid = len / 2; let mut left = arr[..mid].to_vec(); let mut right = arr[mid..].to_vec(); merge_sort(&mut left); merge_sort(&mut right); merge(&mut left, &mut right, arr); } fn merge<T: Ord + Copy>(left: &mut [T], right: &mut [T], arr: &mut [T]) { let (mut i, mut j, mut k) = (0, 0, 0); while i < left.len() && j < right.len() { if left[i] <= right[j] { arr[k] = left[i]; i += 1; } else { arr[k] = right[j]; j += 1; } k += 1; } while i < left.len() { arr[k] = left[i]; i += 1; k += 1; } while j < right.len() { arr[k] = right[j]; j += 1; k += 1; } } fn quick_sort<T: Ord>(arr: &mut [T]) { if arr.len() <= 1 { return; } let pivot = partition(arr); quick_sort(&mut arr[..pivot]); quick_sort(&mut arr[pivot + 1..]); } fn partition<T: Ord>(arr: &mut [T]) -> usize { let pivot = arr.len() - 1; let mut i = 0; for j in 0..pivot { if arr[j] <= arr[pivot] { arr.swap(i, j); i += 1; } } arr.swap(i, pivot); i } fn heap_sort<T: Ord>(arr: &mut [T]) { let n = arr.len(); // Build max heap for i in (0..n / 2).rev() { heapify(arr, n, i); } // Extract elements from heap one by one for i in (1..n).rev() { arr.swap(0, i); heapify(arr, i, 0); } } fn heapify<T: Ord>(arr: &mut [T], heap_size: usize, root: usize) { let mut largest = root; let left = 2 * root + 1; let right = 2 * root + 2; if left < heap_size && arr[left] > arr[largest] { largest = left; } if right < heap_size && arr[right] > arr[largest] { largest = right; } if largest != root { arr.swap(root, largest); heapify(arr, heap_size, largest); } } fn benchmark(sort_fn: fn(&mut [i32]), arr: &mut [i32], name: &str) { let start = Instant::now(); sort_fn(arr); let duration = start.elapsed(); println!("{}: {:?}", name, duration); } fn main() { let mut rng = rand::thread_rng(); let sizes = [1000, 5000, 10000]; for &size in &sizes { println!("\nArray size: {}", size); // Random array let mut arr: Vec<i32> = (0..size).collect(); arr.shuffle(&mut rng); benchmark(heap_sort, &mut arr.clone(), "Heap Sort (random)"); benchmark(merge_sort, &mut arr.clone(), "Merge Sort (random)"); benchmark(quick_sort, &mut arr.clone(), "Quick Sort (random)"); // Sorted array arr.sort(); benchmark(heap_sort, &mut arr.clone(), "Heap Sort (sorted)"); benchmark(merge_sort, &mut arr.clone(), "Merge Sort (sorted)"); benchmark(quick_sort, &mut arr.clone(), "Quick Sort (sorted)"); // Reverse sorted array arr.reverse(); benchmark(heap_sort, &mut arr.clone(), "Heap Sort (reverse)"); benchmark(merge_sort, &mut arr.clone(), "Merge Sort (reverse)"); benchmark(quick_sort, &mut arr.clone(), "Quick Sort (reverse)"); } }
Results
Here are the results from the performance test:
Array Size | Heap Sort | Merge Sort | Quick Sort | Median-of-Three Quick Sort |
---|---|---|---|---|
1000 (Random) | 1.113 | 0.885 | 0.805 | 0.856 |
1000 (Sorted) | 1.133 | 0.807 | 37.360 | 0.520 |
1000 (Reverse) | 1.030 | 0.776 | 28.157 | 1.162 |
5000 (Random) | 5.233 | 3.995 | 3.368 | 3.779 |
5000 (Sorted) | 6.556 | 4.308 | 816.099 | 2.228 |
5000 (Reverse) | 3.156 | 2.431 | 369.294 | 3.064 |
10000 (Random) | 6.630 | 4.945 | 4.946 | 4.799 |
10000 (Sorted) | 7.636 | 4.832 | 2334.642 | 3.928 |
10000 (Reverse) | 5.865 | 4.108 | 1411.711 | 6.799 |
Analysis of Results
- Median-of-Three Quick Sort:
- Sorted Data: The median-of-three optimization significantly improves Quick Sort’s performance on sorted arrays. For example, it reduces the time for a 10,000-element sorted array from 2.33 seconds (Quick Sort) to 3.93 ms, making it faster than Merge Sort.
- Reverse-Sorted Data: While the optimization helps improve performance on reverse-sorted data, Median-of-Three Quick Sort is not necessarily the best choice here. It takes 6.8 ms for a 10,000-element reverse-sorted array, faster than unoptimized Quick Sort (1.41 seconds) but still slower than Merge Sort and slightly better than Heap Sort.
- Random Data: Median-of-Three Quick Sort performs comparably to regular Quick Sort on random data, with only slight differences in performance.
- Heap Sort:
- Consistency: Heap Sort remains consistent across random, sorted, and reverse-sorted arrays, providing predictable O(n log n) performance in all cases. Although it isn’t the fastest, it avoids the significant slowdowns seen with unoptimized Quick Sort on sorted and reverse-sorted data.
- Larger Arrays: For 10,000-element arrays, Heap Sort performs reliably but lags behind Merge Sort and Median-of-Three Quick Sort for both random and reverse-sorted arrays. However, it’s still much faster than unoptimized Quick Sort in these cases.
- Merge Sort:
- Best Overall Performance: Merge Sort continues to be the most reliable option, performing exceptionally well across all input types. It consistently beats Heap Sort and regular Quick Sort in most scenarios, especially for larger datasets.
- Reverse-Sorted Data: Merge Sort excels particularly well with reverse-sorted arrays, with a time of 4.11 ms for a 10,000-element array, the best among the tested algorithms.
- Quick Sort:
- Unoptimized Performance: Regular Quick Sort performs well on random arrays but suffers significantly on sorted and reverse-sorted data due to its O(n²) worst-case complexity. For instance, it takes 2.33 seconds on a sorted 10,000-element array and 1.41 seconds on a reverse-sorted array, which are dramatically slower than the other algorithms.
Conclusion
- Median-of-Three Quick Sort is a strong choice for sorted arrays but does not consistently outperform other algorithms for reverse-sorted data.
- Merge Sort remains the most well-rounded and stable sorting algorithm, performing exceptionally well across all input types.
- Heap Sort is a reliable in-place algorithm that avoids the dramatic slowdowns seen in unoptimized Quick Sort and offers solid performance across all input types.
- Unoptimized Quick Sort should be avoided for sorted or reverse-sorted data due to its significant performance degradation.
Pros and Cons of Heap Sort
Pros:
- Consistent O(n log n) Time Complexity: Heap Sort guarantees O(n log n) time complexity across all cases, regardless of the data’s initial order.
- In-Place Sorting: Heap Sort doesn’t require additional space, making it more memory-efficient than Merge Sort.
- Reliable Performance on Any Input: Whether the data is random, sorted, or reverse-sorted, Heap Sort performs consistently well without degradation.
Cons:
- Not Stable: Heap Sort does not guarantee that equal elements will maintain their relative order, unlike Merge Sort, which is stable.
- Slower in Practice: While Heap Sort has O(n log n) complexity, it tends to make more comparisons than Quick Sort or Merge Sort, making it slower in some practical scenarios.
- Not Cache-Friendly: The binary heap structure used by Heap Sort tends to access memory in a scattered pattern, leading to lower cache performance than algorithms like Quick Sort or Merge Sort that access memory sequentially.
When to Use Heap Sort
Heap Sort is particularly useful in the following scenarios:
- When you need consistent performance: Heap Sort’s worst-case complexity is O(n log n), making it a safe choice when performance predictability is crucial.
- Memory constraints: Heap Sort works in place, making it ideal when minimizing memory usage is essential. This contrasts with Merge Sort, which requires O(n) extra space.
- Data that is difficult to predict: If you’re sorting data that could be sorted, reverse-sorted, or random, Heap Sort is reliable across all types.
Conclusion
Heap Sort is a robust and reliable sorting algorithm that consistently offers O(n log n) performance, regardless of the input data’s initial ordering. Although it’s not as fast in practice as Quick Sort on random data and lacks the stability of Merge Sort, Heap Sort’s predictability and memory efficiency make it an excellent choice for specific use cases where stability is not a concern and additional memory overhead is undesirable.
By understanding how Heap Sort works and comparing its performance with other algorithms, you can better determine when to use it in your projects. If you need stable sorting or faster performance on random data, Merge Sort or Quick Sort may be better options, but if you value consistent performance and minimal space usage, Heap Sort is a solid choice.
Congratulations on reading to the end of the tutorial!
Read the following articles to learn how to implement Heap Sort:
In Python – How to do Heap Sort in Python
In C++ – How to Do Heap Sort in C++
In JavaScript – How to do Heap Sort in JavaScript
Have fun and happy researching!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.