Re: SortedMap question?
On 11/24/2012 6:57 PM, Joerg Meier wrote:
On Sat, 24 Nov 2012 15:23:55 -0800, Knute Johnson wrote:
I have been operating under a mis-perception that a SortedMap would be
more quickly searched than an unsorted Map. More specifically that the
cost would come when adding an element to the SortedMap. I wrote a
little test program to measure the amount of time it took to look for
non-existent data in a Map and a SortedMap and it took somewhere between
twice and three times as long to look for the data in the SortedMap.
Considering Map is an Interface, I question your claim that you measured
its get(x) performance :p
Assuming you used HashMap, as far as I recall, its get(x) performance is
already at O(1) unless you mess up your hashcodes. Doesn't really go lower.
"HashMap is O(1)" is sort of true, but sweeps a lot of dirt
under the rug. Most obviously, there's the possibility of a bad
or unlucky hash distribution -- by no means a remote possibility,
as <http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/>
demonstrates. Even if the hash distribution is acceptably uniform,
there's also the matter of building the map in the first place: As
the insertions accumulate, HashMap may need to expand and re-insert
everything it already contains. With N=n1+n2+...+nk mappings, there
could be n1+(n1+n2)+...+(n1+n2+...+nk) = k*n1+(k-1)*n2+...+1*nk
insertions altogether (a good up-front estimate of N can help here).
Consider also the operation cost. Searching in a TreeMap of
N mappings takes about lg(N) comparisons -- but with String keys
the comparisons will be of varying difficulties. Comparisons near
the root will probably examine only the first few characters to
determine the result; only as the search approaches the leaves will
the Strings' tail characters come into play. On a successful search
HashMap must iterate over the key String's entire length twice: Once
to compute the hashCode(), and again to verify that a match has in
fact been found (comparisons against non-matching keys in the same
bucket will be quick, like those near the root of a TreeMap).
Finally, O() itself hides a lot of detail -- that is, after all,
its purpose. Given the choice between O(N*N) Algorithm X and O(1)
Algorithm Y, which would you choose? Might you change your mind
if you learned that X takes N*N nanoseconds while Y takes a constant
eighteen hours? Might you change your mind yet again if you learned
that N was ten million? O() alone is not enough to decide a question
of speed.
Still and all: HashMap makes a good "default" choice, and is
"likely" to outperform TreeMap over a "reasonable range" of inputs.
Turn to TreeMap when the order of the keys is significant -- for
example, when you need "closest match" for an absent key.
--
Eric Sosman
esosman@comcast-dot-net.invalid