TimSort is a hybrid sorting algorithm that combines the strengths of Merge Sort and Insertion Sort. It is optimized for real-world data, which often contains ordered sequences. TimSort is the default sorting algorithm in Java and Python because of its efficiency in handling partially sorted datasets, making it highly adaptive and fast in practice.
In this blog post, we’ll explore how TimSort works, break down its implementation in Java, and provide performance benchmarks comparing it to traditional sorting algorithms like Merge Sort and Quick Sort.
What Is TimSort
Tim Peters developed TimSort in 2002 for Python. TimSort sorts small sections of the array (known as runs) using Insertion Sort and then merges these runs using Merge Sort. This approach helps achieve an optimal time complexity of O(n log n), even in the worst-case scenario, while maintaining stable sorting (i.e., preserving the order of equal elements).
TimSort is designed to minimize the number of comparisons and swaps, making it faster than typical comparison-based algorithms like Quick Sort or Merge Sort in many real-world applications.
How TimSort Works
TimSort’s main idea is to exploit the natural order of real-world data. It does this by:
- Identifying Runs: Finding sections of the array that are already sorted (ascending or descending).
- Sorting Small Runs: Using Insertion Sort for small sections (or runs) to minimize overhead.
- Merging Runs: Efficiently merging these runs using Merge Sort ensures that the sorting is stable and efficient.
Key Concepts:
- Runs: A run is a contiguous sequence of already sorted elements. TimSort scans the array to detect these runs.
- MinRun: The MinRun is the minimum size of runs that TimSort creates. TimSort uses Insertion Sort for small arrays (smaller than MinRun).
Pseudocode for TimSort
procedure TimSort(arr) n = length of arr minRun = calculateMinRun(n) # Step 1: Sort small sub-arrays using Insertion Sort for i = 0 to n - 1 in steps of minRun do InsertionSort(arr, i, min(i + minRun - 1, n - 1)) # Step 2: Merge sorted runs using Merge Sort size = minRun while size < n do for left = 0 to n - 1 in steps of 2 * size do mid = min(left + size - 1, n - 1) right = min(left + 2 * size - 1, n - 1) if mid < right: Merge(arr, left, mid, right) size = 2 * size
Step-by-Step Explanation:
- Step 1: The array is divided into smaller runs, and each run is sorted using Insertion Sort. The minimum size of each run is determined dynamically.
- Step 2: The sorted runs are merged using Merge Sort, gradually increasing the size of the sorted portions until the entire array is sorted.
Time Complexity of TimSort
- Best Case: O(n)—TimSort performs best with linear time complexity when the input array is already sorted.
- Average Case: O(n log n) – TimSort performs at O(n log n) for most inputs, similar to other efficient sorting algorithms.
- Worst Case: O(n log n) – Even in the worst-case scenario, TimSort guarantees O(n log n) time complexity due to its merging process.
Space Complexity of TimSort
- Space Complexity: O(n) – TimSort requires additional space for merging, as it uses a temporary array during the merge step.
Java Implementation of TimSort
Here’s the Java code to implement TimSort:
import java.util.Arrays; public class TimSort { // Insertion Sort for sorting small subarrays public static void insertionSort(int[] arr, int left, int right) { // Traverse each element starting from left+1 to the right for (int i = left + 1; i <= right; i++) { int temp = arr[i]; int j = i - 1; // Shift elements of the sorted part that are greater than the current element while (j >= left && arr[j] > temp) { arr[j + 1] = arr[j]; j--; } // Place the current element at the correct position arr[j + 1] = temp; } } // Merge two sorted subarrays into a single sorted array public static void merge(int[] arr, int left, int mid, int right) { // Sizes of the two subarrays int len1 = mid - left + 1; int len2 = right - mid; // Temporary arrays for the two subarrays int[] leftArr = new int[len1]; int[] rightArr = new int[len2]; // Copy data to temporary arrays for (int i = 0; i < len1; i++) leftArr[i] = arr[left + i]; for (int i = 0; i < len2; i++) rightArr[i] = arr[mid + 1 + i]; // Initial indices for left, right, and merged arrays int i = 0, j = 0, k = left; // Merge the two subarrays into the original array while (i < len1 && j < len2) { if (leftArr[i] <= rightArr[j]) { arr[k] = leftArr[i]; i++; } else { arr[k] = rightArr[j]; j++; } k++; } // Copy any remaining elements from the left subarray (if any) while (i < len1) { arr[k] = leftArr[i]; i++; k++; } // Copy any remaining elements from the right subarray (if any) while (j < len2) { arr[k] = rightArr[j]; j++; k++; } } // Calculate the minimum run size for the array public static int minRunLength(int n) { int r = 0; // Keep reducing the size and accumulate remainder bits until the size is < 64 while (n >= 64) { r |= (n & 1); n >>= 1; } return n + r; } // TimSort implementation public static void timSort(int[] arr, int n) { int minRun = minRunLength(n); // Step 1: Sort small subarrays using insertion sort for (int i = 0; i < n; i += minRun) { insertionSort(arr, i, Math.min((i + minRun - 1), (n - 1))); } // Step 2: Merge sorted subarrays of increasing sizes in a bottom-up manner for (int size = minRun; size < n; size = 2 * size) { for (int left = 0; left < n; left += 2 * size) { int mid = left + size - 1; int right = Math.min((left + 2 * size - 1), (n - 1)); // Merge the two halves if mid < right if (mid < right) { merge(arr, left, mid, right); } } } } // Main method to test the TimSort implementation public static void main(String[] args) { // Example array to be sorted int[] arr = {5, 21, 7, 23, 19, 1, 6, 3, 12, 17, 10}; // Print the original array System.out.println(STR."Original Array: \{Arrays.toString(arr)}"); // Perform TimSort on the array timSort(arr, arr.length); // Print the sorted array System.out.println(STR."Sorted Array: \{Arrays.toString(arr)}"); } }
Output:
Original Array: [5, 21, 7, 23, 19, 1, 6, 3, 12, 17, 10] Sorted Array: [1, 3, 5, 6, 7, 10, 12, 17, 19, 21, 23]
Java Specific Details
1. Memory Management:
Java handles memory with automatic garbage collection. Temporary arrays (like those in merge()
) are cleaned up automatically but can lead to overhead. Excessive memory usage may trigger garbage collection pauses, which impacts performance.
2. Bounds Checking:
Java performs automatic array bounds checking, throwing an ArrayIndexOutOfBoundsException
if an invalid index is accessed. This helps with debugging but adds a slight performance cost.
3. Primitive Types vs. Objects:
Using int[]
(primitives) is more efficient than using object wrappers like Integer[]
. With generic types (T[]
), Java uses objects, which can cause overhead due to boxing/unboxing.
4. System.arraycopy():
The System.arraycopy()
method is faster than manual copying in loops. It optimizes the performance when merging arrays:
System.arraycopy(arr, left, leftArr, 0, len1);
5. Multithreading:
Java 8 introduced Arrays.parallelSort()
, a multithreaded sorting method. You can parallelize TimSort’s merging phase, but this example is single-threaded.
6. Java’s Built-in TimSort:
Java’s Arrays.sort()
uses TimSort for object sorting (e.g., String[]
, Integer[]
) since Java 7. For primitive arrays, Java uses Dual-Pivot QuickSort, which is optimized for low memory overhead.
7. Stack Size and Recursion:
TimSort avoids deep recursion compared to QuickSort, reducing the risk of StackOverflowError
on large arrays in Java due to limited stack size.
Step-by-Step Explanation of TimSort with Example
Let’s walk through how TimSort processes the array [5, 21, 7, 23, 19, 1, 6, 3, 12, 17, 10]
. For this example, assume the minimum run size (min_run
) is 4. The minimum run size is usually calculated based on the array length, not fixed at 4. It’s typically chosen to be between 32 and 64.
1. Identifying Runs
TimSort starts by identifying “runs,” which are consecutive sections of the array that are already sorted. If a run is shorter than the min_run
size, it will be sorted using Insertion Sort.
Initial array: [5, 21, 7, 23, 19, 1, 6, 3, 12, 17, 10]
The array is divided into runs:
- Run 1 (index 0 to 3):
[5, 21, 7, 23]
(not sorted, needs sorting) - Run 2 (index 4 to 7):
[19, 1, 6, 3]
(a descending run, reverse it to[3, 6, 1, 19]
before sorting) - Run 3 (index 8 to 10):
[12, 17, 10]
(not sorted, needs sorting)
2. Sorting Runs with Insertion Sort
- Run 1 (Sorting
[5, 21, 7, 23]
):- Compare 21 > 5 → No change.
- Compare 7 < 21 → Shift 21 to the right →
[5, 21, 21, 23]
, insert 7 →[5, 7, 21, 23]
. - Compare 23 > 21 → No change.
[5, 7, 21, 23]
- Run 2 (Sorting
[3, 6, 1, 19]
):- Compare 6 > 3 → No change.
- Compare 1 < 6 → Shift 6 right →
[3, 6, 6, 19]
, shift 3 right →[3, 3, 6, 19]
, insert 1 →[1, 3, 6, 19]
. - Compare 19 > 6 → No change.
[1, 3, 6, 19]
- Run 3 (Sorting
[12, 17, 10]
):- Compare 17 > 12 → No change.
- Compare 10 < 17 → Shift 17 right →
[12, 17, 17]
, insert 10 →[10, 12, 17]
.
[10, 12, 17]
3. Merging Runs
Once the runs are sorted, TimSort merges them using a method similar to Merge Sort.
- Merge Run 1 and Run 2:
- Merging
[5, 7, 21, 23]
and[1, 3, 6, 19]
:- Compare 1 < 5 → Place 1.
- Compare 3 < 5 → Place 3.
- Compare 5 < 6 → Place 5.
- Compare 6 < 7 → Place 6.
- Compare 7 < 19 → Place 7.
- Compare 19 < 21 → Place 19.
- Place remaining
[21, 23]
.
[1, 3, 5, 6, 7, 19, 21, 23]
- Merging
- Merge with Run 3:
- Merging
[1, 3, 5, 6, 7, 19, 21, 23]
and[10, 12, 17]
:- Compare 1 < 10 → Place 1.
- Compare 3 < 10 → Place 3.
- Compare 5 < 10 → Place 5.
- Compare 6 < 10 → Place 6.
- Compare 7 < 10 → Place 7.
- Compare 10 < 19 → Place 10.
- Compare 12 < 19 → Place 12.
- Compare 17 < 19 → Place 17.
- Place remaining
[19, 21, 23]
.
[1, 3, 5, 6, 7, 10, 12, 17, 19, 21, 23]
- Merging
This process shows how TimSort first sorts smaller sections of the array using Insertion Sort and then merges them like Merge Sort to produce a fully sorted array.
In actual implementations of TimSort, additional optimizations like galloping mode (for faster merging when one run is significantly smaller than the other) and balancing runs are used to improve performance further. However, the steps described here provide a solid foundation for understanding TimSort’s key operations.
Performance Test: Timsort vs Quick Sort vs Merge Sort
Now that we’ve implemented TimSort, let’s evaluate its performance by comparing it to two other popular sorting algorithms: Quick Sort and Merge Sort. We will include both the original Quick Sort algorithm and the optimized median-of-three Quick Sort algorithm. We use the median-of-three variant because it helps mitigate Quick Sort’s worst-case performance on already sorted or reverse-sorted arrays by choosing a more reliable pivot. We will test the algorithms on random, sorted, nearly-sorted, and reverse-sorted arrays of varying sizes (1000, 5000, and 10,000 elements).
import java.util.Arrays; import java.util.Random; public class SortingBenchmark { // Insertion Sort for small runs (used in TimSort) public static void insertionSort(int[] arr, int left, int right) { for (int i = left + 1; i <= right; i++) { int temp = arr[i]; int j = i - 1; while (j >= left && arr[j] > temp) { arr[j + 1] = arr[j]; j--; } arr[j + 1] = temp; } } // Merge function for Merge Sort and TimSort public static void merge(int[] arr, int left, int mid, int right) { int len1 = mid - left + 1; int len2 = right - mid; int[] leftArr = new int[len1]; int[] rightArr = new int[len2]; for (int i = 0; i < len1; i++) leftArr[i] = arr[left + i]; for (int i = 0; i < len2; i++) rightArr[i] = arr[mid + 1 + i]; int i = 0, j = 0, k = left; while (i < len1 && j < len2) { if (leftArr[i] <= rightArr[j]) { arr[k] = leftArr[i]; i++; } else { arr[k] = rightArr[j]; j++; } k++; } while (i < len1) { arr[k] = leftArr[i]; i++; k++; } while (j < len2) { arr[k] = rightArr[j]; j++; k++; } } // Merge Sort public static void mergeSort(int[] arr, int left, int right) { if (left < right) { int mid = (left + right) / 2; mergeSort(arr, left, mid); mergeSort(arr, mid + 1, right); merge(arr, left, mid, right); } } // Quick Sort public static void quickSort(int[] arr, int low, int high) { if (low < high) { int pivot = partition(arr, low, high); quickSort(arr, low, pivot - 1); quickSort(arr, pivot + 1, high); } } public static int partition(int[] arr, int low, int high) { int pivot = arr[high]; int i = low - 1; for (int j = low; j < high; j++) { if (arr[j] < pivot) { i++; swap(arr, i, j); } } swap(arr, i + 1, high); return i + 1; } // Median-of-Three Quick Sort public static void quickSortMedianOfThree(int[] arr, int low, int high) { if (low < high) { int pivot = medianOfThreePartition(arr, low, high); quickSortMedianOfThree(arr, low, pivot - 1); quickSortMedianOfThree(arr, pivot + 1, high); } } public static int medianOfThreePartition(int[] arr, int low, int high) { int mid = (low + high) / 2; medianOfThree(arr, low, mid, high); return partition(arr, low, high); } public static void medianOfThree(int[] arr, int low, int mid, int high) { if (arr[low] > arr[mid]) swap(arr, low, mid); if (arr[low] > arr[high]) swap(arr, low, high); if (arr[mid] > arr[high]) swap(arr, mid, high); swap(arr, mid, high - 1); // Use median as pivot } // Swap elements public static void swap(int[] arr, int i, int j) { int temp = arr[i]; arr[i] = arr[j]; arr[j] = temp; } // Calculate minimum run size for TimSort public static int minRunLength(int n) { int r = 0; while (n >= 64) { r |= (n & 1); n >>= 1; } return n + r; } // TimSort implementation public static void timSort(int[] arr, int n) { int minRun = minRunLength(n); // Sort individual subarrays of size minRun using Insertion Sort for (int i = 0; i < n; i += minRun) { insertionSort(arr, i, Math.min((i + minRun - 1), (n - 1))); } // Merge sorted subarrays for (int size = minRun; size < n; size *= 2) { for (int left = 0; left < n; left += 2 * size) { int mid = Math.min((left + size - 1), (n - 1)); int right = Math.min((left + 2 * size - 1), (n - 1)); if (mid < right) { merge(arr, left, mid, right); } } } } // Benchmarking method public static void benchmark(String name, Runnable sort) { long start = System.nanoTime(); sort.run(); long duration = System.nanoTime() - start; System.out.printf("%s: %.3f ms%n", name, duration / 1e6); } public static void main(String[] args) { int[] sizes = {1000, 5000, 10000}; Random rand = new Random(); for (int size : sizes) { System.out.println("Array size: " + size); // Random array int[] randomArray = rand.ints(size, 0, size).toArray(); // Run benchmarks benchmark("TimSort (random)", () -> { int[] arr = Arrays.copyOf(randomArray, randomArray.length); timSort(arr, arr.length); }); benchmark("Merge Sort (random)", () -> { int[] arr = Arrays.copyOf(randomArray, randomArray.length); mergeSort(arr, 0, arr.length - 1); }); benchmark("Quick Sort (random)", () -> { int[] arr = Arrays.copyOf(randomArray, randomArray.length); quickSort(arr, 0, arr.length - 1); }); benchmark("Median-of-Three Quick Sort (random)", () -> { int[] arr = Arrays.copyOf(randomArray, randomArray.length); quickSortMedianOfThree(arr, 0, arr.length - 1); }); // Sorted array int[] sortedArray = Arrays.copyOf(randomArray, randomArray.length); Arrays.sort(sortedArray); benchmark("TimSort (sorted)", () -> timSort(Arrays.copyOf(sortedArray, sortedArray.length), sortedArray.length)); benchmark("Merge Sort (sorted)", () -> mergeSort(Arrays.copyOf(sortedArray, sortedArray.length), 0, sortedArray.length - 1)); benchmark("Quick Sort (sorted)", () -> quickSort(Arrays.copyOf(sortedArray, sortedArray.length), 0, sortedArray.length - 1)); benchmark("Median-of-Three Quick Sort (sorted)", () -> quickSortMedianOfThree(Arrays.copyOf(sortedArray, sortedArray.length), 0, sortedArray.length - 1)); // Reverse sorted array int[] reverseArray = Arrays.copyOf(sortedArray, sortedArray.length); for (int i = 0; i < reverseArray.length / 2; i++) { swap(reverseArray, i, reverseArray.length - i - 1); } benchmark("TimSort (reverse)", () -> timSort(Arrays.copyOf(reverseArray, reverseArray.length), reverseArray.length)); benchmark("Merge Sort (reverse)", () -> mergeSort(Arrays.copyOf(reverseArray, reverseArray.length), 0, reverseArray.length - 1)); benchmark("Quick Sort (reverse)", () -> quickSort(Arrays.copyOf(reverseArray, reverseArray.length), 0, reverseArray.length - 1)); benchmark("Median-of-Three Quick Sort (reverse)", () -> quickSortMedianOfThree(Arrays.copyOf(reverseArray, reverseArray.length), 0, reverseArray.length - 1)); System.out.println(); } } }
Results
Array Size & Configuration | TimSort (ms) | Merge Sort (ms) | Quick Sort (ms) | Quick Sort (Median-of-Three) (ms) |
---|---|---|---|---|
1000 (Random) | 0.517 | 0.648 | 0.309 | 0.201 |
1000 (Sorted) | 0.168 | 0.159 | 3.684 | 0.120 |
1000 (Reverse) | 0.639 | 0.163 | 1.785 | 0.057 |
1000 (Nearly Sorted) | 0.270 | 0.157 | 0.055 | 0.046 |
5000 (Random) | 1.236 | 1.207 | 0.361 | 0.330 |
5000 (Sorted) | 0.338 | 0.825 | 10.019 | 0.145 |
5000 (Reverse) | 0.589 | 0.451 | 9.288 | 0.236 |
5000 (Nearly Sorted) | 0.361 | 0.481 | 0.259 | 0.358 |
10000 (Random) | 1.382 | 1.456 | 0.726 | 0.841 |
10000 (Sorted) | 0.358 | 0.731 | 60.350 | 0.202 |
10000 (Reverse) | 2.512 | 0.921 | 41.839 | 0.254 |
10000 (Nearly Sorted) | 0.463 | 1.184 | 0.597 | 0.379 |
Note: Times are rounded to three decimal places for readability. Actual performance may vary slightly due to system-specific factors.
Key Takeaways:
- Median-of-Three Quick Sort is consistently the fastest algorithm across all scenarios, especially for sorted, nearly sorted, and reverse-sorted data. This highlights the advantage of the median-of-three optimization, which helps mitigate worst-case performance in Quick Sort by choosing a better pivot.
- Quick Sort (Original) suffers from significant performance drops in sorted and reverse-sorted arrays, particularly in larger sizes (e.g., 10000 elements), where it took over 60 ms and 41 ms, respectively. This demonstrates Quick Sort’s vulnerability to poor pivot selection in these cases.
- TimSort performs consistently well across all scenarios, showcasing its efficiency on random, sorted, reverse-sorted, and nearly sorted data. TimSort’s specialized handling of sorted and nearly sorted arrays results in excellent performance (e.g., 0.168 ms for 1000 sorted elements).
- Merge Sort generally maintains stable performance but is outperformed by TimSort and Median-of-Three Quick Sort on random and nearly sorted data. However, Merge Sort is faster than TimSort on reverse-sorted arrays for larger sizes, suggesting its stability in this configuration.
- Nearly Sorted Arrays show that Median-of-Three Quick Sort and TimSort are the best choices, with both algorithms outperforming others. TimSort’s performance in nearly sorted data is close to its performance on fully sorted data, which shows its adaptability to real-world cases.
Median-of-Three Quick Sort is the fastest algorithm for most cases, while TimSort stands out for handling real-world datasets with pre-existing order. Merge Sort is consistently stable but generally slower than the other algorithms for most array configurations.
Pros and Cons of Using TimSort
Pros:
- Efficient for Real-World Data: TimSort is particularly fast for datasets that contain ordered or nearly ordered sequences, making it suitable for real-world applications.
- Stable Sort: TimSort preserves the relative order of elements with equal values, making it a stable sorting algorithm.
- Adaptive to Input: TimSort is adaptive, meaning it performs better on data that is already partially sorted.
Cons:
- Slightly More Complex: TimSort is more complex to implement than basic algorithms like Quick Sort or Insertion Sort.
- Space Overhead: TimSort uses additional space for the merging process, which may not be ideal in memory-constrained environments.
When to Use TimSort
TimSort is an ideal sorting algorithm when:
- You have partially sorted or nearly sorted data: Since TimSort is designed to handle ordered sequences efficiently, it’s perfect for real-world data that often has some ordering.
- You need a stable sorting algorithm: TimSort ensures that the relative order of elements is preserved, making it suitable for datasets where stability matters.
- You need an adaptive sorting algorithm: TimSort adapts to the existing order in the data, making it faster in practice than other O(n log n) algorithms.
Conclusion
TimSort is a highly efficient and adaptive sorting algorithm that excels at handling real-world datasets with pre-existing order. Its hybrid nature, combining Insertion Sort for small runs and Merge Sort for larger segments, allows it to perform exceptionally well on sorted and nearly sorted data. This adaptability gives TimSort a significant edge in cases where other algorithms, like Merge Sort or Quick Sort, may struggle.
Whether you’re dealing with small, sorted, or reverse-sorted arrays, TimSort’s efficiency, stability, and adaptability make it an essential tool for developers. It combines the best of Insertion Sort and Merge Sort to handle diverse datasets and is a reliable go-to algorithm for many practical applications.
Congratulations on reading to the end of the tutorial!
In Python – How to do TimSort in Python
In C++ – How to Do TimSort in C++
In JavaScript – How to do TimSort in JavaScript
Have fun and happy researching!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.