A confusion of sorts; home many passes does it really take to sort a list? Albert Nijenhuis.
A Confusion of Sorts
They say that you can't sort a list of N objects in less that N log N operations, but I'll show you how. Sounds familiar? Not many years ago, there was a similar-sounding challenge: "You can't trisect an angle.' Many man-years have been spent trying to disprove that one, and if the debate has ended, it's because interest has waned--not because the problem has been settled in the public's mind.
Let's look briefly at the angle-trisection problem. What does it mean?
"No angle can be trisected.' Well, of course, we know from geometry how to construct an angle of 30 degrees, which is one-third of 90 degrees. So, some angles can be trisected.
"I can trisect an angle so well no one can find any discrepancy.' Again, that is not the point. Assuming perfect tools, can you divide an angle into three parts of exactly equal measure?
"I have a gadget that will perform exact trisections.' Such tools (without moving parts) actually exist, and at this point it is necessary to use more precise language. The intended statement was not "You can't trisect an angle' but "You can't exactly trisect an arbitrary angle using just straightedge and compass.' And that can't be done. That's higher math--and it's time to get back to sorting.
Sorting Terminology
First a piece of terminology. We shall denote by N the length of a list that we want to sort, and we want to measure the amount of labor (the number of tests, or displacements, or whatever) required to do this. The expression O(N) ("Oh of N') denotes a quantity which is basically proportional to N (it has the "order of magnitude' of N). It might be 3N or 17N or 12N + 300 or 7N + log N (note that log N is much smaller than N for large values of N). On the other hand, N2 is not O(N) because when N increases, N2 increases faster than any constant multiple of N. Implicit in the notation O(N) is some number (any of the coefficients 3, 17, 12, or 7 above) to multiply by N. Its value is not considered "interesting' (though most people would prefer a 2 to a 100 in a labor count), because it can be highly machine-dependent. The expression O(N log N) denotes a quantity which is basically proportional to N log N.
The debate, then, is usually centered on the question: Do we need O(N log N) comparisons to sort a list, or can we do it in O(N) comparisons?
Suppose you are given a list of length N which consists of four digit numbers and which must be sorted in increasing order. I will show that I can do this with no comparisons at all: Pass through the list and as you do so, move every number within a leading 0 to the beginning. When done, do the same with the leading digit 1 in the remaining list, etc. When finished, the list is sorted as far as the first digit is concerned. Now, in each of the ten sublists do the same with respect to the second digit, and so on.
In this way we have sorted our list with no comparisons between members of the list at all. We have performed labor (testing of digits, displacements), and that is O(N), namely 40N, or if you are a clever programmer, only about 4N, give or take a small multiplier.
A variation on the avove method uses an array of size 10,000, one place for each of the possible occurrences in the list, and uses it in some fashion (many variations exist) to sort the list in a relatively simple way, again in O(N) labor, with no comparisons at all.
These comments are intended to illustrate that if there is to be any meaning to the O (N log N) story, some more precise definitions are needed.
First of all, the statement concerns comparisons, not displacements-- although many comparisons result in displacements, of course. Second, we mean honest comparisons with no tricks, such as testing digits or calculating an address from the value. To put it another way, the sorting method itself should not examine the objects to be sorted in any way, but leave this to a "black box' which, when given two objects, say A and B, issues one of the three pronouncements: "A precedes B,' "B precedes A' or "A and B may be placed in either order.'
The only requirement of the black box is that it be consistent. For example, if it has made the pronouncements "A precedes B' and "B precedes C,' then if confronted with A and C it must come with the pronouncement "A precedes C.' (A "good' sorting method achieves its efficiency by not presenting A and C to the black box.)
A Precise Definition
Given, then, a list of N objects in arbitrary order and a black box that is capable of making pronouncements on the objects in the list, what is the minimum number of calls to the black box needed to sort the list in the most unfavorable (called "worst possible') case? In this context, that minimum is O(N log N).
To show why this is so, let the list-to-be-sorted be A(1),...,A(N). Sorting this list means determining a permutation of the numbers 1, . . ., N (i.e., a rearrangement of 1, . . ., N), which we may denote p(1), . . ., p(N), such that A(p(1)), . . ., A(p(N)) are in the order defined by the black box.
For example, if N=3, A(1)=7, A(2)=1, A(3)=4, then the increasing order is A(2), A(3), A(1). That is, p(1)=2, P(2)=3, p(3)=1. Usually, sorting methods do not explicitly compute p(1),...P(N), but it is simple to change any method--without changing the comparisons--to produce the permutation.
Sorting a list, thus, means determining one specific permutation. (We assume that all objects in the list are distinct, else the permutation is not completely unique.) That is, we are looking for one out of N! (N factorial) permutations, because that is the number of permutations that exist. Let S denote the set of them all.
The first comparison of the method will compare, say, A(I) and A(J). Suppose the black box responds with "A(I) precedes A(J).' Then we know that in the as-yet-undetermined permutation p(1), . . ., p(N) the number I will have to precede J.
This one comparison has therefore, in effect, divided the set S into two sets of permutations: the set S1 of "good' ones, in which I precedes J, and the "bad' ones, in which J precedes I. We could make a list of the permutations that belong to S1 if we cared to do so.
Next, another comparison is made and it results in the same way in retaining a set S2 of permutations in S1 and rejecting the others. This process continues until we are left with a set of just one permutation--the one we wanted.
The description we have just given says nothing about the strategy by which the objects that are to be compared are chosen. That is deliberate, because the description as it now stands covers all possible strategies.
A Best Sorting Strategy?
At this point we claim that a "best' strategy would be so designed that at the first comparison between A(I) and A(J), the set S1 and the set of permutations not in S1 are essentially equal in size. This may go against the notion that a small set S1 would more quickly lead to a single permutation. The point is, however, that another list would lead to a black box reply of "A(J) precedes A(I),' and its "good' list is the "bad' list of the case we considered. To do best in the "worst case,' neither list should suffer at the hands of the other. The same argument applies to subsequent comparisons.
We see, therefore, that a "best' strategy--if such exists--will, for each comparison, cut the then current list of "good' permutations in half, or close to half. The minimum number (k) of comparisons will therefore be that number for which n!/2k is essentially 1. It cannot be exactly 1, because no factorial greater than 2! is a power of 2. That is, kis very close to log2(N!). Again, log2(N!) is not an integer, but the next higher integer will be the lowest value that k could have for a "worst case.' We shall ignore this "next higher integer' business, because we do not worry in our count about one comparison more or less. Thus we simply conclude that any sorting method based on comparisons requires at least log2(N!) comparisons in the worst case.
A formula, due to Stirling, gives a very good approximate formula for N!, namely
N! = O (N(N)+.5/e(N)
where the constant in the O() is, in fact, the square root of 2 , but that is irrelevant. From this it follows that:
log(1)(N!) = N log2N - N log2e
+ .5 log2N + O(1)
(the O(1) denotes some uninteresting almost constant quantity). It certainly follows from this that:
log2(N!) = N log2N + O(N)
which is what this article is all about. Observe that this is only a lower bound. All we have shown is that, if we bisect a set of N! permutations repeatedly, this many bisections will be needed to end up with a single permutation. We have not shown that such bisections can always be achieved by comparisons of pairs of objects in our list.
The existing sorting methods of type N log N have "worst case' comparison counts of C N log2N + O(N), but the C is not always equal to 1 as above in our lower bound estimate. For example, a good Merge Sort has C=1 but Heapsort has C=2. In a sense, Heapsort "wastes' half the comparison, because half of them are never followed by an interchange. (See "How Not to be Out of Sorts II,' Creative Computing, September 1980.) Most other popular sorting methods, such as Shell Sort and Quicksort, have only average comparison counts of O(N log N) but the "worst case' performance (relatively rarely encountered) is higher--O(N1.5) or O(N2), for example. Detailed mathematical analyses of these and other methods can be found in Volume III of The Art of Computer Programming by D. E. Knuth.