Re: Array optimizing problem in C++?
jason.cipriani@gmail.com wrote:
If you are going for blinding speed be sure to use proper optimization
flags; otherwise you are only calculating benchmarks for poorly
optimized code, which, like most of your benchmarks, is meaningless:
$ g++ -O0 smooth.cpp
9884.867 ms
$ g++ -O3 smooth.cpp
8791.123 ms
$ g++ -O3 -mfpmath=sse -msse3 -march=prescott -ffast-math smooth.cpp
1207.944 ms
GCC's -ffast-math option breaks semantics, so it is not a valid
optimization. I would not use it in a comparison against Java unless you
know that accuracy is not required for the given task.
$ g++ -O3 -mfpmath=sse -msse3 -march=prescott -ffast-math -funroll-
loops smooth.cpp
1084.385 ms
This is your code slightly modified to use QueryPerformanceCounter for
timing (using Windows). Most of the speed up is a result of -ffast-
math.
Based on the quality and rigor of the other tests I've seen you do,
I'd guess that you missed a few optimization flags on the Java side as
well.
I have altered all implementations of this benchmark to perform progressive
blurring from one array to the other and back rather than overwriting the
destination array repeatedly. This slightly improved performance in all
cases.
On my Athlon 64 X2 4400+ (2.2Ghz) running AMD64 Debian Linux, I get
best-of-three results:
g++: 4.080 g++ -O3 bench.cpp -o bench; time ./bench
g++2: 4.080 g++ -O3 bench2.cpp -o bench2; time ./bench2
ocaml: 4.048 ocamlopt bench.ml -o bench; time ./bench
java: 4.016 javac Arrays.java; time java Arrays
On the same hardware running 32-bit Windows XP Pro, I get:
F#: 3.970 fsc -O3
This shows an insignificant difference in performance which, if nothing
else, is interesting because the second C++ program uses bounds checking. I
was also surprised to see OCaml do so well on this benchmark because it
makes no attempt to hoist bounds checks. F# on .NET typically beats all
open source compilers on Linux on this kind of benchmark (float array
intensive). C# presumably would too.
The bench2 program is the following, more idiomatic, C++ version that uses
bounds checking:
#include <vector>
#include <iostream>
#include <cstdlib>
#include <ctime>
void smooth (std::vector<double> &dest, std::vector<double> &src)
{
for (int i = 1; i < src.size() - 1; i++)
dest.at(i - 1) = (src.at(i - 1) + src.at(i) + src.at(i + 1)) / 3 ;
}
void fill (std::vector<double> &src)
{
srand((unsigned)std::time(0));
for (int i = 0; i < src.size(); ++ i )
src.at(i) = rand();
}
int main()
{
const int len = 50000;
std::vector<double> src_array(len);
std::vector<double> dest_array(len);
fill(src_array);
clock_t start=clock();
for (int i = 0; i < 5000; i++)
{
smooth(dest_array, src_array);
smooth(src_array, dest_array);
}
clock_t endt=clock();
std::cout <<"Time smooth(): " <<
double(endt-start)/CLOCKS_PER_SEC * 1000 << " ms\n";
return 0;
}
Here is my OCaml equivalent:
let smooth dst src =
let n = Array.length dst in
for i=1 to n-2 do
dst.(i-1) <- (src.(i-1) +. src.(i) +. src.(i+1)) /. 3.
done
let () =
let n = 50000 in
let src = Array.init n (fun _ -> Random.float 1.) in
let dst = Array.create n 0. in
let t = Sys.time() in
for i=1 to 5000 do
smooth dst src;
smooth src dst;
done;
Printf.printf "Time smooth(): %fs\n" (Sys.time() -. t)
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?u