Some bugs in the paper:

  Figure 3, explicit bounds check should generate the size like this:
    size = 1 << table[p >> log_of_slot_size]

  Figure 3, optimized bounds check should probably be
    (p^p') >> table[p >> log_of_slot_size] == 0

  Figures 5 and 18, pointer arithmetic code should probably be
    char *p = &buf[i];
  or
    char *p = buf + i;