Re: Verifying a list is alphabetized
On Sun, 4 Dec 2011, Martin Gregorie wrote:
On Sat, 03 Dec 2011 20:05:24 -0800, Roedy Green wrote:
On Sat, 03 Dec 2011 07:44:48 -0500, Eric Sosman
<esosman@ieee-dot-org.invalid> wrote, quoted or indirectly quoted
someone who said :
What is the big-Oh complexity of Collections.sort(List<String>)
on an already-sorted list?
IIRC QuickSort goes a bit bananas if the file is already sorted. ISTR
some sort doing some random swaps at the start to make sure that does
not happen. It does do a check first if things are in order.
I'm a little wary of quicksort's claim of superior speed in every
situation.
What claims? Quicksort, being an algorithm, makes no claims on its behalf.
Tony Hoare, being a scholar and a gentleman, made a strong but measured
claim about it (from the conclusion of [1]):
The number of cycles of the innermost comparison loop is close to the
theoretical minimum, and the loop may be made very fast. The amount of
data movement within the store is kept within very reasonable bounds.
Quicksort is therefore likely to recommend itself as the standard sorting
method on most computers with a large enough random-access store to make
internal sorting worth while.
Although it seems he may at one point have made rather more extravagant
claims which he came to regret - a footnote in the above paper tries to
cover them up:
The claim to a negative sorting time in the reference is, of course, due
to a misprint.
I very much doubt that anyone worth listening to has ever made a claim
that quicksort is the fastest in every situation. Pathological cases have
been known for such a long time that to do so would be to court honorary
membership of the Flat Earth Society.
However, then I got cute and replaced the simple comparison function,
which I used in all the algorithms, with a more complex one, capable of
multi-key sorting as well as natural, lexicographic, caseless, and
numeric ordering for each key. Much to my surprise, I discovered that
using this comparator, the quicksort was slower than the shell sort,
giving a performance ordering (slow to fast) of Bubble, Selection,
Quicksort and Shell.
Evidently there's an unstated assumption in Quicksort that record swaps
will always be slower and/or less costly than comparisons, so it was
optimised to minimise swaps and not to worry about the cost of
comparisons. This feature was emphasised in my test series by the
extremely low cost of record swaps.
It's possible that quicksort does fewer swaps than Shell sort. But i don't
think it was focused on that: Hoare's point in the quote above is that (my
emphasis) "the number of cycles of the innermost *comparison* loop is
close to the theoretical minimum". Sortologists have understood the
importance of minimising the number of comparisons for a very long time.
On the contrary, the number of comparisons made by Shell sort is,
according to the so-called experts at wikipedia, not terribly rigorously
known, but is generally something like O(n^k), with k > 1:
http://en.wikipedia.org/wiki/Shellsort
Which is worse than the O(n log n) you get with other sorts. There will be
particular cases where it beats quicksort, or some other sort, and perhaps
even particular applications covering many cases, but that is not a
general conclusion you can draw.
tom
[1] http://comjnl.oxfordjournals.org/content/5/1/10.full.pdf+html
tom
--
So the moon is approximately 24 toasters from Scunthorpe.