October 28, 2004

Bithacks operation counting update

In response to Paul Messmer's email, I revised the operation counting methodology on my bithacks page to not count large constants as operations.  It probably takes more time to load the value from a variable than a large immediate, even when multiple instructions are required to load it, because fetching values from memory has grow slower with faster CPUs clockrates.  Since I don't count variables as operations, I decided not to count constants.  My initial policy was due to the small number of registers on the popular x86 CPUs and the mulitple instructions required for loading large immediates on MIPS CPUs.

He also recommeded that I count the length of computational dependency chains rather than total operations.  For example, in the following, x should take less time to compute than y:

x = a + b + c + d + e + f;
y = ((((a + b) | c) + d) | e) + f;
While (a+b) is added, a modern superscalar processor could add (c+d) and (e+f) simultaneously.   Unfortunately IA32 processors are limited in the number of registers they can use, and compilers for them typically don't bother to emit instructions that can be executed in parallel.  I tested this with g++ 3.4 on a Pentium III, an Athlon XP, and an Athlon 64.  Only after I tweeked the assembly output for the Athlon 64 (with it running in 64-bit mode) could I get the expected 40% speed-up.  

So before updating the bithacks page to reflect the length of the computational dependency chains, I'll wait until at least half of the CPUs sold are capable of x86-64 (or EM64T, as Intel puts it), and people are typically running them in 64-bit mode (on Windows 64 or Linux 64), and compilers are capable of emitting code that can be executed in parallel.

/* The following uses the Gnu Assembler (gas) syntax, which has the destination register last */
/* Compiler's a+b+c+d+e+f: */
addl %esi, %edi
addl %edx, %edi
addl %ecx, %edi
addl %r8d, %edi
leal (%rdi,%r9), %eax
/* My hand-tweaked a+b+c+d+e+f: */
addl %esi, %edi
addl %ecx, %r8d
addl %edx, %r9d
addl %edi, %r8d
leal (%r8,%r9), %eax
/* Compiler's ((((a+b)|c)+d)|e)+f: */
addl %esi, %edi
orl %edx, %edi
addl %ecx, %edi
orl %r8d, %edi
leal (%rdi,%r9), %eax

Posted by seander at October 28, 2004 04:17 PM
Comments