Informatique

What is radixsort ?

Disclaimer: This article is part of a serie of articles about the Rust Voracious Radix Sort Crate. This article is about the basic knowledge of the radix sort.

  • What is radixsort ?
  • Voracious sort
  • Diverting LSD sort – TBA –
  • Radix sort: The best of the two worlds – TBA –

In computer science field, one often needs to sort something. A list of unordered elements can be sorted to output a list of ordered elements. These operations applied to a list of unordered elements is called sorting. For example:

[7, 1, 3, 2, 8, 9] => [1, 2, 3, 7, 8, 9]

This can also be done on something else than numbers:

[elephant, whale, cat, dog, horse] => [cat, dog, elephant, horse, whale]

In these examples, an ordering is defined and applied to the lists: ascending ordering for numbers and lexicographical ordering for strings.

To sort a set of elements, a comparative function is used in the sorting algorithm. This function, given two elements can tell which one is smaller, which one is bigger or if there is a tie.

function compare(a, b) {
    if (a == b) {
        return Equal;
    } else if (a < b) {
        return Smaller;
    } else {
        return Greater;
    }
}

Sorting algorithms using this kind of comparative function are called « comparative sorting algorithms ».

Non comparative sorting algorithms

Obviously, if there are comparative sorting algorithms, there are also non comparative sorting algorithms. That is to say, sorting algorithms that do not use a comparative function.

How is it possible to sort elements without a comparative function ?

To achieve this, non comparative sorting algorithms use the binary expression of a number.

Does this mean that a non comparative sorting algorithm can only sort numbers ? No, but this algorithm will need to map each element to a number. However it is not the topic here.

What is the binary expression of a number ? It is the number expressed in base 2 instead of base 10. In base 2 there are only two symbols: {0, 1}. Whereas in base 10 there are 10 symbols: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Let’s count in base 2:

 0 => 0000 
 1 => 0001
 2 => 0010
 3 => 0011
 4 => 0100
 5 => 0101
 6 => 0110
 7 => 0111
 8 => 1000
 9 => 1001
10 => 1010

As it is true in base 10, in base 2 the left most symbol is the most significant for the ordering, and the right most symbol is the less significant for the ordering.

Non comparative sorting algorithms use this information to recursively sort a set of elements. Let’s do it through an example:

[0110, 0011, 0001, 1010, 1000, 0010, 1001, 0000, 0100, 0101, 0111]
[   6,    3,    1,   10,    8,    2,    9,    0,    4,    5,    7]

We start with the first binary number, let’s split numbers by moving numbers with a « 1 » as first binary number to the left position in the list, and the other to the right:

[1010, 1000, 1001] [0110, 0011, 0001, 0010, 0000, 0100, 0101, 0111]
[   10,   8,    9] [   6,    3,    1,    2,    0,    4,    5,    7]
[____bucket 1____] [___________________bucket 0___________________]

This creates two buckets. The left bucket whose first binary numbers is a « 1 » and the right bucket whose first binary numbers is a « 0 ».

We have to keep doing this for other binary number recursively in each bucket. Let’s continue with the second binary number:

[            ] [1010, 1000, 1001] [0110, 0100, 0101, 0111] [0011, 0001, 0010, 0000]
[            ] [   10,   8,    9] [   6,    4,    5,    7] [   3,    1,    2,    0]
[_bucket 1.1_] [___bucket 1.0___] [______bucket 0.1______] [______bucket 0.0______]

Now the third binary number:

[] [] [1010] [1000, 1001] [0110, 0111] [0100, 0101] [0011, 0010] [0001, 0000]
[] [] [  10] [   8,    9] [   6,    7] [   4,    5] [   3,    2] [   1,    0]

And now the fourth binary number:

[] [] [] [] [] [1010] [1010] [1000] [0111] [0110] [0101] [0100] [0011] [0010] [0001] [0000]
[] [] [] [] []   [10]    [9]    [8]    [7],   [6],   [5]    [4]    [3],   [2]    [1]    [0]

Marvelous, the list is fully sorted.

As you can see, each pass creates two buckets from a previous one. At each pass the number of buckets doubles and at each pass buckets are « sorted », that is to say: all elements in the first bucket are bigger than any element in the second bucket, all elements in the second bucket are bigger that any element in the third bucket, all elements in the third bucket are bigger that any element in the fourth bucket and so on. Once all binary numbers are processed, the list is sorted.

This is how a radix sort works.

It is possible to sort in reverse order by switching the bucket order.

What is the « radix » ?

In the previous example, each pass uses only one binary number at once to create a bucket. It is possible to use more than one binary number. Let’s do an example with two binary numbers at once on this list:

[0110, 0011, 0001, 1010, 1000, 0010, 1001, 0000, 0100, 0101, 0111]
[   6,    3,    1,   10,    8,    2,    9,    0,    4,    5,    7]

If two binary numbers are taken into account, it means that there are 4 possibilities: {11, 10, 01, 00}. For each possibility, we create a bucket. Let’s do it for the first and the second binary numbers:

[] [1010, 1000, 1001] [0110, 0100, 0101, 0111] [0011, 0001, 0010, 0000]
[] [  10,    8,    9] [   6,    4,    5,    7] [   3,    1,    2,    0]

Now the third and fourth binary numbers:

[1010] [1010] [1000] [0111] [0110] [0101] [0100] [0011] [0010] [0001] [0000]
[  10]    [9]    [8]    [7],   [6],   [5]    [4]    [3],   [2]    [1]    [0]

With only two passes, the list is sorted.

The radix is the number of binary numbers used at once to create buckets.

Intuitively, one can think that it is better to use a big radix because there is less passes to do. What if the list to sort looks like this:

[0, 1000000000000000000000000]

For only two elements, there are too many buckets – despite empty -, this algorithm requires too many memory.

It is a trade-off between the number of passes and the required memory.

To learn more

I explained here only the very basic of how radix sort works. It is also possible to use binary number in reverse order, that is to say, from right to left instead of left to right. If you want to know more, you can start by the Wikipedia page of the radix sort.