Floating point is slow, you say?

And Bresenham’s algorithm is fast because it uses only integer operations?

Wake up. Modern processors crunch floating points even faster than integers. Don’t believe me? Read on.

Here’s a small test application, in good ol’ C:

void f1()
{
double d = 0;
int i;

for (i=0; i<10000; ++i) d = d * i + 1; } void f2() { int d = 0; int i; for (i=0; i<10000; ++i) d = d * i + 1; } int main() { f1(); f2(); return 0; } [/sourcecode] Let's compile this with a Microsoft VC 7.1 compiler with this command line: cl -Zi tm.c

The -Zi is so that .pdb (program database) files are generated, that contain symbol information and is used by the profiler to instrument the binaries. My profiler of choice is AQTime. Rather cool one, that. So anyway, let’s run this in AQTime and see what it reports:

f1: 6639.89 μs, 18545395 machine cycles
f2: 196.82 μs, 549727 machine cycles

[The machine is a 2.8GHz Intel Pentium with HT.]

Hrmph. Nowhere near. Let’s try again with a different compiler option.

cl -Zi -arch:SSE2 tm.c

The “-arch:SSE2” instructs the compiler to generate SSE2 instructions. (Note sure what is SSE2? Read the Intel developer’s manual, over here.)

OK, what does the profiler say now?

f1: 224.12 μs, 625973 machine cycles
f2: 278.28 μs, 777237 machine cycles

Hah! Surprised? Well don’t be. These are the new rules of the old game.

Let’s try again, with one more option:

cl -Zi -G7 -arch:SSE2 tm.c

The “-G7” is to “optimize for Pentium 4 or Athlon”. The results:

f1: 159.31 μs, 444965 machine cycles
f2: 265.11 μs, 740473 machine cycles

Suprised? Sweating? There’s-something-wrong-with-his-profiler?

Know your hardware. It pays.

3 Comments

  1. Lars Gregersen said,

    Oct 19, 2007 at 3:22 am

    I would check what instructions (native or CLR) are created.

    A clever optimizer would see that the result isn’t really needed and simply skip the entire loop.

  2. antipattern said,

    Oct 19, 2007 at 7:36 pm

    Well yes, that was my first reaction too. But in all three cases the iteration is still there in the generated assembly listing.

    The instructions generated are native. CLR requires the “/clr” switch. Even in CLR the iteration is there.

  3. Wrong said,

    Nov 26, 2008 at 9:01 pm

    This is generating overflow in no time! Hence, the only thing that has to be done for the floating point version after a few iterations is to add 1 to NaN (==NaN), which the CPU should be able to find out very fast. The integer version is having the same problem, but the resulting value is still a valid integer.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: