Sorts and Partial Orderings
Yesterday, Kenny posted a interesting question on comp.lang.lisp. Essentially, the question comes down to the behavior of SORT when the predicate provided is only a partial ordering, rather than a complete ordering.
As a quick review, a total ordering is an operator O which is anti-symmetric (a O b -> NOT b O a), transitive (a O b and b O C -> a O C), and total (a O b OR b o a). An example is the > operator over the reals (multiple instances of a single number are “the same object”.) A partial ordering is still anti-symmetric and transitive, but may not be total. The classic example is the operator that checks for ancestry in a directed acyclic graph (DAG).
Kenny seemed to be experiencing a “bug” where he passed a partial ordering predicate to SORT, and the returned sequence didn’t respect the partial ordering—continuing the DAG analogy, there was an instance where A was ancestor of B but B occurred before A in the resulting sequence.
Is it a bug? Quoting the Hyperspec on SORT:
If the key and predicate always return, then the sorting operation will always terminate, producing a sequence containing the same elements as sequence (that is, the result is a permutation of sequence). This is guaranteed even if the predicate does not really consistently represent a total order (in which case the elements will be scrambled in some unpredictable way, but no element will be lost). If the key consistently returns meaningful keys, and the predicate does reflect some total ordering criterion on those keys, then the elements of the sorted-sequence will be properly sorted according to that ordering.We see that if we pass a predicate which doesn’t represent a total order, the only thing we’re guaranteed is that the returned sequence contains the same elements as the original. So the behavior Kenny’s seeing is “conformant.”
Is it a “bug” in the spec? Should SORT return sequences which respect partial orderings? I don’t think so. In the comparison model of sorting, the only thing the sorter can do is compare two elements of the sequence at once, and then shuffle elements around. Now imagine that we pass a partial order where the first element is bigger than the second, and the remaining elements are all incommensurate, with both each other and the first two elements. (As a graph, this is a single chain of length 2, and n-2 isolated nodes.) The only way we’re going to be able to guarantee we return a sequence that puts elements one and two in the right order is if we compare them, and to guarantee we compare them, we have to compare everything to everything else—O(n^2) comparisons.
A partial ordering is transitive over pairs on which its defined. But if we make a predicate that returns true if A is an ancestor of B, and false if B is an ancestor of A or they’re incommensurate, this predicate is not transitive. The O(n log n) sorting algorithms assume that the predicate is transitive, and the argument above shows that this will necessarily lead to wrong results for partial order predicates for at least some input sequences.
What’s the answer? In the comparison model, we’re basically screwed. If all we can do is compare two elements, it will take O(n^2) time to order a sequence to respect an arbitrary partial order. On the other hand, if our ordering is ancestors in a DAG, and we can enumerate the children of a node, we can sort into a partial order in O(n) time, using a topological sort. This is hopefully the answer to Kenny’s problem, although it’s not yet entirely clear.
Feb 27, 02:36 pm | common-lisp |
Comments
commenting closed for this article