Heap sort is an efficient, comparison-based sorting algorithm that relies on a binary heap data structure to sort elements in place. It’s known for its time complexity of O(n log n), making it a reliable choice for large datasets. This blog post will explain how heap sort works, provide a Python implementation, and compare it to other sorting algorithms like quick sort and merge sort.
What is Heap Sort?
Heap sort works by first building a max heap from the input data and then repeatedly extracting the maximum element (the root of the heap) to place it at the end of the array. This process ensures that the array is sorted in increasing order. The heap is continuously “reheapified” after each extraction, maintaining its structure.
Heap sort uses a max heap to sort the array in ascending order.
How Heap Sort Works:
- Build a max heap from the input data.
- Swap the root of the max heap (the largest value) with the last element of the heap.
- Reduce the heap size by one and heapify the root.
- Repeat the process until the heap is empty and the array is sorted.
Time Complexity:
- Best Case: O(n log n)
- Worst Case: O(n log n)
- Average Case: O(n log n)
Space Complexity:
- O(1) (heap sort is an in-place sorting algorithm, meaning it doesn’t require additional space).
Heap Sort Algorithm in Python
Here’s how you can implement heap sort in Python:
# Helper function to maintain the max heap property def heapify(arr, n, i): largest = i # Initialize the largest as root left = 2 * i + 1 # Left child index right = 2 * i + 2 # Right child index # If left child exists and is greater than root if left < n and arr[left] > arr[largest]: largest = left # If right child exists and is greater than the largest so far if right < n and arr[right] > arr[largest]: largest = right # If the largest element is not the root, swap and heapify again if largest != i: arr[i], arr[largest] = arr[largest], arr[i] heapify(arr, n, largest) # Main function to perform heap sort def heap_sort(arr): n = len(arr) # Step 1: Build a max heap for i in range(n // 2 - 1, -1, -1): heapify(arr, n, i) # Step 2: Extract elements one by one from the heap for i in range(n - 1, 0, -1): # Move the current root to the end arr[i], arr[0] = arr[0], arr[i] # Call heapify on the reduced heap heapify(arr, i, 0) # Example usage: data = [12, 11, 13, 5, 6, 7] heap_sort(data) print("Sorted array:", data)
Output:
Sorted array: [5, 6, 7, 11, 12, 13]
Step-by-Step Explanation of Heap Sort
Let’s break down the heap sort process step by step for the array [12, 11, 13, 5, 6, 7]
.
Step 1: Build a Max Heap
- The first step in heap sort is to build a max heap from the input array. A max heap ensures that the largest element is always at the root of the heap.
Max heap after heapify:
[13, 11, 12, 5, 6, 7]
Explanation: In this heap, the root element (13) is the largest, and each parent node is larger than its children.
Step 2: Extract Elements from the Heap
- Once the max heap is built, we repeatedly extract the maximum element (the root) by swapping it with the last element in the array. We then reheapify the heap to maintain the heap property.
For example:
- Swap the root (13) with the last element (7).
- Heapify the remaining elements.
Array after first extraction:
[12, 11, 7, 5, 6, 13] (13 is in its correct position)
Step 3: Repeat Until Sorted
- Repeat the extraction and heapification process until the array is fully sorted.
Final sorted array:
[5, 6, 7, 11, 12, 13]
Comparison to Other Sorting Algorithms
1. Heap Sort vs Quicksort
- Time Complexity: Both quicksort and heap sort have an average time complexity of O(n log n). However, quicksort has a worst-case time complexity of O(n²) if the pivot selection is poor (though this can be mitigated with good pivot selection strategies like randomization or median-of-three).
- Stability: Quicksort is not stable (unless modified), and heap sort is also not stable. Stability means that equal elements retain their relative order.
- Space Complexity: Heap sort has a space complexity of O(1), making it an in-place sort. Quicksort has a space complexity of O(log n) due to its recursive calls.
2. Heap Sort vs Merge Sort
- Time Complexity: Both merge sort and heap sort have a worst-case time complexity of O(n log n). However, merge sort requires additional space (O(n)) for its auxiliary array, while heap sort is an in-place algorithm with O(1) space complexity.
- Stability: Merge sort is stable, meaning that equal elements retain their original relative order. Heap sort is not stable.
- Application: Merge sort is typically preferred for sorting linked lists or datasets that require stability, while heap sort is ideal when memory usage is a concern.
3. Heap Sort vs Insertion Sort
- Time Complexity: Insertion sort has a time complexity of O(n²) in the average and worst case, while heap sort maintains O(n log n). Insertion sort, however, performs better on small datasets or nearly sorted arrays with O(n) best-case complexity.
- Space Complexity: Both heap sort and insertion sort are in-place algorithms with O(1) space complexity.
Advantages of Heap Sort
- In-Place Sorting: Heap sort requires only O(1) extra space, making it memory-efficient compared to algorithms like merge sort.
- Guaranteed O(n log n) Performance: Heap sort guarantees O(n log n) time complexity in both the average and worst cases, unlike quicksort, which can degrade to O(n²) in the worst case.
- Efficient for Large Datasets: Because heap sort uses a binary heap, it can efficiently handle large datasets with relatively low memory overhead.
Limitations of Heap Sort
- Not Stable: Heap sort is not a stable sorting algorithm. This means that if two elements are equal, their relative order in the original array might not be preserved in the sorted array.
- Slower in Practice Compared to Quicksort: Although heap sort has a guaranteed time complexity of O(n log n), quicksort typically outperforms it in practice due to lower constant factors in quicksort’s recursive calls and cache efficiency.
- Not Adaptive: Unlike insertion sort, heap sort does not perform better on nearly sorted data. Its time complexity remains O(n log n) regardless of the initial ordering of the elements.
Conclusion
Heap sort is a robust sorting algorithm with consistent time complexity across all cases, making it a good choice for large datasets. While it may not be as fast in practice as quick sort or space-efficient as merge sort for specific tasks, heap sort’s in-place sorting, and O(n log n) performance make it a valuable tool when memory is limited.
Congratulations on reading to the end of this tutorial!
For further reading on sorting algorithms in Python, go to the articles:
- How to do Insertion Sort in Python
- How to Do Bubble Sort in Python
- How to Do Selection Sort in Python
- How to Do Bucket Sort in Python
- How to Do Pigeonhole Sort in Python
- How To Do Comb Sort in Python
- How to Do Shell Sort in Python
- How To Do TimSort in Python
Have fun and happy researching!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.