Counting Sort algorithm

A non-comparison based sorting algorithm that sorts the input based on the unique occurrence of the individual keys.

Counting Sort is a non-comparison-based sorting algorithm that identifies the correct slot of an element in the input by identifying the number of elements that are less than the said element. It does so by counting the number of times each element is present in the input and then using this information to sort the original input.

Consider the following example: [1, 5, 3, 1, 2, 4]

Let’s create a frequency array for this input where each value denotes the number of times the corresponding “index” is present in the input array: [0, 2, 1, 1, 1, 1]. In other words, the frequency array suggests that in the original input:

  1. 0 will take 0 slots
  2. 1 will take 2 slots
  3. 2 will take 1 slot
  4. 3 will take 1 slot
  5. 4 will take 1 slot
  6. 5 will take 1 slot

From the above listing, we can conclude that 1 will take two slots (as its freq is 2) in the output, so 2 can come only at the 3rd slot (or index 2) to accommodate all occurrences of 1 before it to ensure sorted order.

Frequency Array

To consider all the slots for every element, we can add the values across the indices to prepare a running frequency. The final running frequency array highlights one key aspect:

every value (say runningFreq[i]) in the array represents the starting index in the sorted output where i+1 will go.

  1. Value 0 at index 0 represents the location of 1 in the sorted output.
  2. Value 2 at index 1 represents the location of 2 in the sorted output.
  3. Value 3 at index 2 represents the location of 3 in the sorted output.
  4. Value 4 at index 3 represents the location of 4 in the sorted output.
  5. Value 5 at index 4 represents the location of 5 in the sorted output.

To align the target indices with their values, we shift the updated frequency array to 1 position right and add 0 at the starting position.

Aligned Frequency

Now, using this, we can populate the sorted array accordingly. To handle duplicates, we increment the frequency count:

Sorted Result

Java implementation of this algorithm is as follows:

public void sort(int[] input){
    if(input.length > 0){
        int[] frequencies = populateFrequencies(input);
        for(int i = 1; i < frequencies.length; i++){
            frequencies[i] = frequencies[i] + frequencies[i - 1];
        }
        // to match the index with its value now, let's shift it to the right
        System.arraycopy(frequencies, 0, frequencies, 1, frequencies.length - 1);
        // and add 0 at 0th index
        frequencies[0] = 0;
        int[] result = new int[input.length];
        for(int i = 0; i < input.length; i++){
            result[frequencies[input[i]]] = input[i];
            // increment the frequency count by 1 to
            // accommodate next occurrence of same value input[i]
            frequencies[input[i]]++;
        }
        // copy result to input
        System.arraycopy(result, 0, input, 0, input.length);
    }
}

private int[] populateFrequencies(int[] input){
    int maxValue = Arrays.stream(input).max().getAsInt();
    int[] freq = new int[maxValue + 1];
    // freq[input[i] = freq[input[i] + 1
    for(int index : input){
        freq[index] = freq[index] + 1;
    }
    return freq;
}

The overall complexity of this algorithm is \(O(n+k)\), where n is the input size and k is the maximum value. This is because we iterate through the input items twice: once to populate counts and once to fill in the output array. Both iterations are \(O(n)\) time. Also, we iterate through the frequency array to prepare the running frequency array, which is \(O(k)\) time.

Also, due to the creation of an additional frequency array and a temp result array, an additional space of order \(O(n+k)\) is required. As count sort depends on the range of input elements, in case the range is very high (say more than the input size), we might use a lot of extra space.

Be notified of new posts. Subscribe to the RSS feed.