October 28, 2004

Bithacks operation counting update

In response to Paul Messmer's email, I revised the operation counting methodology on my bithacks page to not count large constants as operations.  It probably takes more time to load the value from a variable than a large immediate, even when multiple instructions are required to load it, because fetching values from memory has grow slower with faster CPUs clockrates.  Since I don't count variables as operations, I decided not to count constants.  My initial policy was due to the small number of registers on the popular x86 CPUs and the mulitple instructions required for loading large immediates on MIPS CPUs.

He also recommeded that I count the length of computational dependency chains rather than total operations.  For example, in the following, x should take less time to compute than y:

x = a + b + c + d + e + f;
y = ((((a + b) | c) + d) | e) + f;
While (a+b) is added, a modern superscalar processor could add (c+d) and (e+f) simultaneously.   Unfortunately IA32 processors are limited in the number of registers they can use, and compilers for them typically don't bother to emit instructions that can be executed in parallel.  I tested this with g++ 3.4 on a Pentium III, an Athlon XP, and an Athlon 64.  Only after I tweeked the assembly output for the Athlon 64 (with it running in 64-bit mode) could I get the expected 40% speed-up.  

So before updating the bithacks page to reflect the length of the computational dependency chains, I'll wait until at least half of the CPUs sold are capable of x86-64 (or EM64T, as Intel puts it), and people are typically running them in 64-bit mode (on Windows 64 or Linux 64), and compilers are capable of emitting code that can be executed in parallel.

/* The following uses the Gnu Assembler (gas) syntax, which has the destination register last */
/* Compiler's a+b+c+d+e+f: */
addl %esi, %edi
addl %edx, %edi
addl %ecx, %edi
addl %r8d, %edi
leal (%rdi,%r9), %eax
/* My hand-tweaked a+b+c+d+e+f: */
addl %esi, %edi
addl %ecx, %r8d
addl %edx, %r9d
addl %edi, %r8d
leal (%r8,%r9), %eax
/* Compiler's ((((a+b)|c)+d)|e)+f: */
addl %esi, %edi
orl %edx, %edi
addl %ecx, %edi
orl %r8d, %edi
leal (%rdi,%r9), %eax

Posted by seander at 04:17 PM | Comments (0)

Improvement to finding a zero byte in a word

Paul Messmer sent me an improvement to my bithack for determining if a word has a zero byte.  It performs a quick pretest, which produces false positives on the rare occasions that the first byte is 0x80.  All positives are checked using the more reliable method.  Because it only takes 3 operations for the fast pretest, it runs faster.

bool mayHaveZeroByte = ((v + 0x7efefeff) ^ ~v) & 0x81010100;
if (mayHaveZeroByte) // or may just have 0x80 in the high byte
  {
  bool hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);
  }
Posted by seander at 03:41 PM | Comments (0)

Arithmetically interleaving bits

I added a technique to my bithacks page for interleaving the bits of two 8-bit bytes to form a 16-bit value.  I had been meaning to create such a arithmetic interleaving technique ever since I made the ontological tableau showing it to be a glaring hole, but I hadn't got around to it.  Holger Bettag emailed me the suggestion.
// Return bits from x in even positions and bits from y in the odd
unsigned short
InterleaveBitsFromBytes(unsigned char const x, unsigned char const y)
  {
  return ((x * 0x0101010101010101ULL & 0x8040201008040201ULL) * 
          0x0102040810204081ULL >> 49) & 0x5555 |
         ((y * 0x0101010101010101ULL & 0x8040201008040201ULL) * 
          0x0102040810204081ULL >> 48) & 0xAAAA;
  }
This technique works by using two instances of the same basic pattern transformation, which spreads the bits of x and y out, requiring only an OR operation.  The spreading is described below.
We want bit pattern abcdefgh to become a0b0c0d0e0f0g0h
                                                          abcdefgh
* 0000000100000001000000010000000100000001000000010000000100000001 = 0x0101010101010101ULL
==================================================================
  abcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefgh
& 1000000001000000001000000001000000001000000001000000001000000001 = 0x8040201008040201ULL
==================================================================
  a00000000b00000000c00000000d00000000e00000000f00000000g00000000h
* 0000000100000010000001000000100000010000001000000100000010000001 = 0x0102040810204081ULL
------------------------------------------------------------------
  a00000000b00000000c00000000d00000000e00000000f00000000g00000000h
+ 00b00000000c00000000d00000000e00000000f00000000g00000000h0000000
+ 0000c00000000d00000000e00000000f00000000g00000000h00000000000000
+ 000000d00000000e00000000f00000000g00000000h000000000000000000000
+ 00000000e00000000f00000000g00000000h0000000000000000000000000000
+ 0e00000000f00000000g00000000h00000000000000000000000000000000000
+ 000f00000000g00000000h000000000000000000000000000000000000000000
+ 00000g00000000h0000000000000000000000000000000000000000000000000
+ 0000000h00000000000000000000000000000000000000000000000000000000
==================================================================
  aebfcgdhebfcgdhe0fcgdhe0f00dhe0f0g0he0f0g0h00f0g0h0000g0h000000h
>>1001001                                                          = 49
==================================================================
  ?????????????????????????????????????????????????aebfcgdhebfcgdh
& 0000000000000000000000000000000000000000000000000101010101010101 = 0x5555
==================================================================
  0000000000000000000000000000000000000000000000000a0b0c0d0e0f0g0h


Posted by seander at 03:27 PM | Comments (0)

June 23, 2004

Integer log of subnormal IEEE float

Last Sunday, Sean A. Irvine suggested that I add code for handling subnormals in my method to find the integer log base 2 of a 32-bit IEEE float.  He suggested the following, which doesn't terminate when the input, v, is 0:
const float v; // compute c = int(log2(v))
int x = *(int *)&v;
int c = x >> 23;
if (c == 0) {
   // handle subnormals
   while ((x & 0x400000) == 0) {
     x <<= 1;
     c--;
   }
}
return c - 127;
I didn't like the idea that the code would loop indefinately, so I replaced the loop, which computes the integer log base 2 of the mantissa, with a version that computes it via table lookup; it is slightly modified from the general case, since we need only deal with 23 bits.  It should be faster than the code above.
const float v;   // find int(log2(v)), where v > 0.0 && finite(v)
const int x = *(int *) &v; 
int c = x >> 23; // c gets the result;

if (c)
{
  c -= 127;
}
else
{ // subnormal, so recompute using mantissa: c = intlog2(x) - 149;
  register unsigned int t; // temporary
  // Note that LogTable256 was defined earlier
  if (t = x >> 16)
  {
    c = LogTable256[t] - 133;
  }
  else
  {
    c = (t = x >> 8) ? LogTable256[t] - 141 : LogTable256[x] - 149;
  }
}
Posted by seander at 10:43 PM | Comments (0)

Another non-branching integer abs

Thorbjørn Willoch suggested that I add r = (v ^ (v >> (sizeof(int) * 8 - 1))) + ((unsigned) v >> (sizeof(int) * 8 - 1)) for anyone interested in avoiding the patented method for computing the integer absolute value without branching, though his takes another operation.   I added my own variation of it, r = (v ^ (v >> (sizeof(int) * 8 - 1))) + (v < 0), which takes as many operations, though gcc generated fewer x86 instructions for it (4) than his (5).  The patented version is still the best at 3. Another variation came to mind, but I didn't think it sufficiently portable:
union {long long ll; int i[2];} t;
t.ll = v; // sign-extend
r = (t.i[0] ^ t.i[1]) - t.i[ BYTE_ORDER == LITTLE_ENDIAN];
Posted by seander at 10:25 PM | Comments (0)

June 16, 2004

Sign extension bit hacks

I received an email from a Sean A. Irvine in New Zealand earlier this week.  He was reading my bithacks page and suggested that I mention sign extending.  I guess he uses Java, which doesn't have the bit fields feature that C has.  I wrote three entries for sign extending, depending on if you have the bit fields of C and a fixed bit width or want something standard for variable bit widths or want something quick and dirty again for variable bit widths.

Posted by seander at 11:35 AM | Comments (0)

April 08, 2004

Photojournal script

I was working on a script that aids in creating photojournal webpages.  I imagine there are plenty out there already, but I was in the mood to write it, and I learned a thing or two.  It is simple and allows a list of image filenames (one per line) to be passed to it as input.  Thumbnails are made from these image files if necessary.  There are some limitations on the filenames:  they cannot be of the form <H1> or <H2> ... <H9> and they cannot start with a pound sign (#).  The pound sign indicates that the line is a comment.  The <H?> means that it is a heading, so by following the <H?> by a tab and then a string, you can create an html heading of the desired level.  The image filenames may be also followed by a tab and a string, which will be used to describe the thumbnails.  The size of the thumbnails may be specified by using another tab character followed by a string, such as "20%" or "x100" or "200x300!".   See the convert command for details.  Additionally, the images are titled with their filename, size  in KB, resolution, and date of creation.

#!/bin/awk -f

# makethumbs
# Copyright (c) 2004 Sean Eron Anderson

# Take images filenames input and create thumbnails and output html to show them.
# Stdout will be a HTML table with the thumbnails and links to the full images.
# Each input line will have the image filename followed by an optional
# description, followed by an optional thumbnail size.

# Lines that start with # are comments
# Lines that start with <H followed by a digit 1-9 followed by > are headings, so
# So:
# <H2>\tExample Title 

# Simple example:  Go to a directory with images and type:
# ls -1 [^_]*.JPG | makethumbs >index.html

BEGIN {
  FS = "\t";
  intable = 0;
  }

/<[Hh][1-9]>\t(.*)/ {
  if (intable) {
    print "</table>";
    }
  match($1, /<(.*)>/, a);
  printf("<%s>%s</%s>\n\n", a[1], $2, a[1]);
  if (intable) {
    print "<table>"
    }
  }

!/^<[Hh][1-9]>\t/ && !/^[#]/ && /.+/ {
  if (!intable) {
    printf("<table>\n");
    intable = 1;
    }
  size = ($3 == "") ? "x100" : $3;
  resizecmd = "test _" $1 " -nt " $1 " || convert -size " size " " $1 " -resize x100 +profile '*' _" $1;
  system(resizecmd);

  geomcmd = "identify -size 1x1 -verbose "$1" | grep 'Base geom' | sed 's/  Base geometry: //'";
  geomcmd | getline geom;

  sizecmd = "identify -size 1x1 -verbose "$1" | grep 'Filesize' | sed 's/  Filesize: //'";
  sizecmd | getline size;

  datecmd = "date -r " $1;
  datecmd | getline date;

  info = $1 "  " size "  " geom "  " date;
  printf("  <tr>\n    <td><a href='%s'><img src='_%s' alt='' title='%s'></a></td>\n", 
    $1, $1, info);
  printf("    <td>%s</td>\n  </tr>\n", $2);
  }

END {
  if (intable) {
    printf("</table>\n");
    }
  }
Posted by seander at 07:36 PM | Comments (0)

December 14, 2003

Another bithack and a correction

The other day a person named Glenn Slayden sent me a new hack for my bithacks page as well as a correction.  The correction was for finding the log base 2 of an N-bit integer in O(lg(N)) operations where the number is known to be a power of 2 (special case below).  Some months ago I had optimized the array of constants for the general code above, but forgot to keep the constants for the special case code below, which caused it to fail.  It is fixed now.  The new hack is for conditionally setting or clearing bits in a word.  Suppose you want to set bits, defined by mask m, in a word, w, if flag f is true; if flag f is false you want to clear the bits in w that are defined in m.  The hack is to do this without branching: w ^= (-f ^ w) & m.  This seems to involve more operations than the branching version: if (f) w |= m; else w &= m.  Yet, I did some speed tests on an Athlon XP 2100+ using g++ -O3, and sure enough it does provide a 5-10% improvement.

Posted by seander at 02:25 PM | Comments (0)

July 19, 2003

Bit Hack Attack!

Last Friday around 1:18 a.m. I received an email in slightly broken English from some random person who had read my bithack page and wanted to know how to swap two bits in a byte.  I assume he was a novice programmer, since the obvious solution seemed straight-forward.  I emailed him the answer but continued thinking about it for about 5 more hours, and came up with a couple improvements, so I added Swapping individual bits with XOR.

Later I created an ontological tableau of the various operations and solution methods to highlight where the holes are.  I should figure out more precisely how many operations things take and add that to the table, and then format the table as a webpage, with links from each cell to the code.

Posted by seander at 04:33 PM | Comments (0)