Re: performance of HashSet with strings and integers
In article
<d8f3f9b8-ffef-424d-9f81-3c2fd2a658c5@a6g2000vbp.googlegroups.com>,
Frederik <landcglobal@gmail.com> wrote:
Hi,
I thought of replacing strings with integers in some code I wrote,
because I believed it would benefit performance. But before doing so,
I made a small test class:
public class testStringSetVsIntSet {
public static void main(String[] args) {
long time;
boolean b;
Set set;
Integer I1 = new Integer(100), I2 = new Integer(500);
set = new HashSet();
set.add(I1);
set.add(900);
time = System.currentTimeMillis();
for (int i=0; i<50000000; i++) {
b = set.contains(I1);
b = set.contains(I2);
}
time = System.currentTimeMillis() - time;
System.out.println("Time 1: " + time);
String headStr = "Head";
String agentStr = "Agent";
String qualifStr = "Qualif";
set = new HashSet();
set.add(headStr);
set.add(agentStr);
time = System.currentTimeMillis();
for (int i=0; i<50000000; i++) {
b = set.contains(headStr);
b = set.contains(qualifStr);
}
time = System.currentTimeMillis() - time;
System.out.println("Time 2: " + time);
}
}
But to my surprise, the second loop with the strings appeared to be
twice as fast as the first one with the integers! (first loop 3
seconds, second 1.5 seconds)
I didn't expect this because calculating the hashcode for a string is
normally slower than for an integer (every string character is taken
into account) and I thought the "equals" method for a string should be
slower than for an Integer as well.
Can anybody explain this to me?
A few things:
- Strings cache their hash value in recent JVMs.
- The performance of hashing varies for individual keys based on
collisions and locations within the collision chain.
- All of your map hits can use the fast '==' test rather than equals(),
making collisions an unusually large factor of performance.
- Hotspot was compiling during loop 1. Call the test method twice and
the second results will probably be different.
nitpick: Integer.valueOf(n) and autoboxing returns pooled objects for a
range of values so it's sometimes much more efficient than the Integer
constructor. It's not in the loop so it doesn't matter here.
--
I won't see Goolge Groups replies because I must filter them as spam