Re: instr pipeplines and loop unrolling
andrew_nuss@yahoo.com wrote:
andrew_nuss@yahoo.com wrote:
Hi,
What's the fastest way to do block moves when the arrays<T> are known
to be non-overlapping:
1) Use a library (with overhead unknown)
2) Use a simple for (int i = 0; i < N; i++) {dst[i] = src[i];} loop
This is recommended by the Intel compiler folks.
3) Use a switch statement with loop unrolling to avoid the i<N check.
A little more overhead than 2. Intel compiler recommends against
it.
4) Combine 2) or 3) with using the largest integral type that the
pointers are aligned on
and reinterpret_cast. (Works only with raw pointer arrays).
Again, more overhead.
Does anyone have experience?
Andy
I answered my own question. The simple for loop that increments the
index variable is the best by far. However, an important optimization
when the pointer type is char* or short* is to reinterpret as int* when
possible based on alignment. For char*, its 4.5 times faster to copy
as int* and for short* it is 2 times faster to copy as int*.
If the data being copied from one memory location to another consists
of POD ("plain old data") types (such as pointers) then the program
would have no reason not to call a standard library routine such as
memcpy() to perform the transfer. After, a standard library
implementation makes two implicit promises about its routines: that
they are correct and that they are highly efficient.
A routine like memcpy() after all is certain to be highly optimized -
often in ways that would either be too impractical, too technical, or
too unportable for the program to implement on its own. For example,
memcpy() might load and store the copied bytes through floating point -
or vector - registers. And for large copies, it's conceivable on some
platforms that memcpy would resort to virtual memory to perform a
virtual "copy". In short, unless the purpose of the program is to copy
memory efficiently, it makes sense for the program to delegate that
kind of task to a routine written expressly for the very purpose of
copying memory efficiently. Doing so frees up programming and testing
resources that can then be directed toward implementing the true
'value' of the program under development.
Greg
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]