Heap Sort Algorithm

A comparison-based algorithm, that uses a binary heap data structure to sort an input.

Heap Sort is similar to selection sort as this algorithm is also based on the concept of finding the maximum (or minimum) value in the input and then placing it at its appropriate location.

But unlike selection sort, heap sort does not linearly iterate over the input to identify the relevant element. Instead, it uses binary heap data structure (a max heap) which always ensures that the largest element is always at the root of the heap. Of course, this feature comes at a cost that we’ll explore in the coming sections.

Before discussing the heap sort algorithm, let’s first review a few features of the binary heap data structure. It will help us understand heapify - a core concept of this algorithm:

1. A binary heap is a complete binary tree with all the levels filled other than the last level (i.e., the last level may contain 1 or 0 nodes, but the nodes should be as left as possible).
2. The min-heap tree will contain the smallest value on the root for all the subtrees. Similarly, a max-heap will hold the largest value on the root for all the subtrees.
3. When using an array as the underlying data structure for a heap, the left and right nodes of a node at index i in the array are presented as $$2*i + 1$$ and $$2*i + 2$$, respectively.

Consider the following illustration for a max heap:

As shown, all the nodes are higher in value than their descendants in their respective subtrees. Additionally, except for the last level, all the levels are filled. Any additional element (88 in this case) is to the left of its parent.

Now let’s take a look at how we can construct a heap from the elements of an array:

public void buildMaxHeap(int[] input){
int length = input.length - 1;
for(int i = length / 2; i >= 0; i--){
heapify(input, i, length);
}
}
private void heapify(int[] input, int start, int length){
int left = start * 2 + 1;
int right = start * 2 + 2;
int maxBetweenLeftAndRight = start;

if(left <= length && input[left] > input[start]){
maxBetweenLeftAndRight = left;
}

if(right <= length && input[right] > input[maxBetweenLeftAndRight]){
maxBetweenLeftAndRight = right;
}

if(maxBetweenLeftAndRight != start){
int temp = input[start];
input[start] = input[maxBetweenLeftAndRight];
input[maxBetweenLeftAndRight] = temp;
// as a swap might have broken the max heap property
// re-heapify
heapify(input, maxBetweenLeftAndRight, length);
}
}


The buildMaxHeap method starts from the bottom-most subtree and works toward the top by invoking the heapify process for all the elements from $$length \div 2$$ to index $$0$$.

The heapify method follows the below-mentioned steps to maintain the max-heap properties:

1. Calculate the left and right indices.
2. Mark the start index as the current max index.
3. If the left index is within the range and the value at the left index is more than the value at the max index, update the max index to left.
4. Similarly, if the right index is within the range and the value at the right index is more than the value at the max index, update the max index to right.
5. If either of these operations has updated the max index, swap the input[start] and input[max] values and call the heapify method again.

Once the max heap is prepared, it will have the maximum value at $$0^{th}$$ index. The heap-sort algorithm leverages this property and repeatedly swaps the array’s first and last element. Every swap is followed by a call to the heapify method with the size variable reduced by 1 (as the last element is now correctly sorted).

The heapify method thus will work on 1 less element in every iteration:

public void sort(int[] input){
// build max heap
buildMaxHeap(input);
// the max heap will have the largest element at
// 0th location, so let's swap the root element with
// tail to re-position it at the end
int size = input.length - 1;
while(size >= 0){
int temp = input[0];
input[0] = input[size];
input[size] = temp;
size--;
// the heapify method is called on one less element
// in every iteration
heapify(input, 0, size);
}
}


The overall complexity of the algorithm is $$O(n * log_2n)$$. Due to its complicated implementation, the heap sort algorithm is not used frequently, but the binary heap is often used as a priority queue implementation.

Be notified of new posts. Subscribe to the RSS feed.