Disclaimer: This article is part of a serie of articles about the Rust Voracious Radix Sort Crate. This article is about the DLSD sort inside the Voracious Radix Sort Crate, not the crate itself. For each article, the prerequisite is to read the previous articles and to understand them.
In my previous article, I explained how the Voracious sort works. Voracious sort is a MSD radix sort. Now we will see the LSD radix sort and how it is possible to use diversion for LSD radix sort. Diversion is when we use a another sort as a fallback of the main sort because for small inputs radix sorts are known to perform poorly. Diversion is commonly used in MSD radix sorts, but it is not used with LSD radix sorts because of some constraints we are going to explain and then we will see how it is possible to actually use diversion.
LSD radix sort
In the first article: « What is Radixsort ? », I explained the basic of the MSD radix sort. Now let’s see the LSD radix sort. As you can guess, instead of iterating from the most significant digit to the least significant digit, we are going to iterate from the least significat digit to the most significant digit.
If you want you can read the Wikipedia article about radix sort, the LSD radix is explained through an example. I will also redo this explanation here with a small example too.
The problem when we order least significant digit first is that these very digits will be re ordered during the next least significant digit ordering. So, how is it possible to actually sort something if what is ordered in a first pass is reordered during the next pass ? The answer is stability. Because each pass performed by a LSD radix sort is stable, ordering done before is still hold after another stable ordering. So iterating from the least significant digit to the most significant digit will eventually sort the whole input array.
Let’s see through an example.
let mut array = vec![12, 8, 5, 15, 2, 3, 0, 6, 5];
// In binary it is: [1100, 1000, 0101, 1111, 0010, 0011, 0000, 0110, 0101]
Since we iterate from the least to the most significant digit, we iterate from right to left. Let’s choose 2 for the radix, so we take into account two bits at each pass. The first pass orders the array like this:
As you can see, the first pass orders elements according to the two least significant digits and elements are moved while ensuring the stability property. If we look at the first bucket, elements with **00 each element kept its relative order. 12 was before 8 which was before 0. Since these elements are in the same bucket, they could be ordered in another way, but the stability property would be violated. The stability property is the reason why this LSD radix sorts work.
As you can see, the second pass orders elements according to the two next least significant digits and elements are moved while ensuring the stability property. It is a bit magic, but the whole array is sorted. Since the array only contains small numbers, two passes are enough.
I cheated a bit, each time I told you that the elements are moved while ensuring the stability property, but how it is done ? If we swap elements like it is done for the MSD radix sort, we will violate the stability property. The trick is that we use a buffer array where the whole input array is copied and then each element, one by one, following the array order is moved to its target location. That way, stability property is ensured, but the sort needs an array buffer which is the same size than the input buffer. This is why LSD radix sorts are O(n) in space complexity whereas MSD radix sorts can be O(1) in space complexity if elements are moved using the swap method. It is a tradeoff between stability and space complexity.
Diverting LSD radix sort
If you have well understood how LSD radix sort works, you should be a tad suspicious about diverting LSD radix sort. If we wanted to do this like it is done with MSD radix sort, there is nowhere we could add a diversion since we need to iterate through all the levels to get a sorted array.
What about doing LSD passes on the most significant digits only ? The least significant digits would be unsorted, but the most significant digits would be sorted. It is a lot like a MSD radix sort except that we don’t have the buckets boundaries but there is something else very interesting we can do. It is to compute histograms for several passes at once instead of computing each histogram by each histogram at each pass. It is the same complexity, so theoretically, one solution is as efficient as the other one, but memory accesses are very important and should not be neglected. Actually, doing so, greatly improves the overall efficiency. However we still do not have the buckets boundaries. By iterating through the array, it is fairly easy to find buckets boundaries with a binary mask, but it is costly: O(n). One way to improve that is to expand exponantially the step while we iterate through the array and then doing a binary search to find buckets boundaries. It is a big foot small foot strategy. Once we have all the buckets boundaries, we need to sort the remaining least significant digits. One hypothesis is that a bucket is already almost sorted. One sort is very efficient to sort almost sorted array, it is the insertion sort. With small heuristics – not detailled in this article – we can achieve great results. If a bucket is not almost sorted, we detect it and we use another sort to sort it, for instance, we use the Voracious sort. Therefore Diverting LSD radix sort – DLSD – is not a stable sort anymore.
And that’s it, we have all the necessary logic to understand the DLSD sort. One can ask why is it interesting to do a DLSD radix sort. It is interesting for big key with a Uniform or Normal distribution of the binaries. In this case, the DLSD radix sort does not need to perform lot of passes to sort the array, few passes are enough because all the information in the first passes is enough to sort the array. Bits are informations to sort the array, with big keys, there are more information than what is required to sort the whole input array, therefore, only sorting the minimum number of bits is more efficient than sorting all the bits. The downside is that DLSD performs poorly on other distribution or on small keys. Again, one fits all does not exist.
Let’s do it through an example.
let mut array = vec![97, 57, 17, 2, 87, 56, 33, 30, 40, 21, 27, 71];
// In binary it is: [01100001, 00111001, 00010001, 00000010, 01010111, 00111000,
// 00100001, 00011110, 00101000, 00010101, 00011011, 01000111]
Let’s choose a radix equals 2, et let’s sort only the two leftmost levels in a LSD fashion way. The first pass outputs:
With this example, the second pass has only 2 buckets 00 and 01. Since all the most significant digits are sorted, we can be sure that buckets are correctly sorted. For instance, all elements in the first bucket 00 is smaller than all elements in the second bucket 01. But within a bucket, elements might not be sorted, but they are almost sorted.
If we swap 30 and 21, and then 30 with 27 and then 57 with 56, this bucket is sorted. So with only 3 swaps, all elements in this bucket is sorted. That’s what the insertion sort will do.
The second bucket is already sorted despite that these elements were in the reverse order at the beginning.
It is pretty clear that 3 swaps is less costly than 2 LSD radix sort passes. That is the DLSD’s strenght.
We have to search for buckets boundaries in case of a bucket would be badly unsorted and then it is preferable to call another sort on this bucket instead of using the insertion sort. The insertion sort is very efficient with small arrays or with almost sorted arrays.
The question is how many passes at least or at most should the DLSD perform to output an almost sorted array ? For this purpose, we have to make some assumption. Let’s assume that the most significant bits are uniformly distributed. So the number of bits we should take into account is b = ~log2(array.len()). How many passes is it ? nb_passes = ceil(b / radix).
nb_passes = ceil(log2(array.len()) / radix)
Since, in reality, arrays are rarely perfectly uniformly distributed, it is better to do one pass more than the theoretical nb_passes. So theoretically, nb_passes is roughly enough to almost sort the whole input array and then we ignore all the remaining bits. That is why the larger the key is, the better DLSD is and the more uniform the distribution is, the better DLSD is, because we wouldn’t need to call another sort and we can ignore all the remaining bits. The insertion sort would sort the almost sorted buckets very quickly since it is mostly checking that buckets are sorted.
If you have all understood this far, you should have guessed that the diversion is the call to insertion sort or the other fallback sort when the bucket is too badly unsorted. Therefore DLSD is an LSD radix unstable sort using a diversion.