Re: C++ Speed Vs. Java

From:

"James Kanze" <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Thu, 25 Jan 2007 09:17:32 CST

Message-ID:

<1169733182.253973.156130@a75g2000cwd.googlegroups.com>

Lourens Veen wrote:

James Kanze wrote:

[...]

A couple of quick tests on my machine (Sun Sparc, but using g++
4.1.0) gave some surprising results. First, as expected,
running through the values in order was a lot faster (about five
times) than inverting the rows and the columns. Other than
that, strangely enough, using powers of 2 (1024x1024) was
significantly slower than using a power of 10
(1000x1000)---given that the machine I'm on doesn't have
hardware multiply and divide, I really cannot explain it.

I must admit I'm not quite sure I understand what you've tested (could
you post the programme?)

#define LOOP
      \
    double total = 0.0 ;
      \
    for ( size_t i = 0 ; i < dim2 ; ++ i ) {
      \
        for ( size_t j = 0 ; j < dim1 ; ++ j ) {
      \
            total += myArray[ j ][ i ] ;
      \
        }
      \
    }
      \
    myResult = total

Invoked over a double[N][M], a double (*)[M], and an
std::vector< std::vector< double > >( N, M ) (all members of
their respective, class which performed the benchmark.

but my bet would be a caching effect.

I suspect as much. Note, however, that I've had some very
unexpected timings in the past, for other things, on a Sparc.

In a typical cache design, cache line n maps to memory "lines" n +
m*c, where c is the size of the cache in lines and m some integer. It
is very likely that c is a power of 2 less than or not much more than
the size of one of your rows.

This means that a[0][0] maps to the same line as a[1][0] and in fact
as any a[n][0]. If you traverse column-first, you get a cache miss on
every access.

If the size isn't a power of two, the cache lines will map to the
array in a staggered fashion, so that e.g. a[0][0] and a[1][6] now
have a cache line in common. If the cache is large enough to hold an
entire column, then for a number of columns after the first, there
won't be any cache misses.

That sounds like an likely explination.

This situation is less likely to occur with std::vector I think.

Definitly. On the other hand, supposing that my inner dimension
corresponds to the size of a page, I'm pretty much guaranteed
that each index results in a cache miss with std::vector, no?
So how come it performed significantly better when the inner
dimension was a power of two?

I'm intreged, and I'd like to investigate more, but I'm afraid I
don't have the time. So about all I can really conclude is:
don't take any pre-conceived assumptions concerning performance
for granted. (And don't suppose that one particular measurement
is typical of anything.)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientie objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place Simard, 78210 St.-Cyr-l'Icole, France, +33 (0)1 30 23 00 34

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]