Friday, 30 August 2013

Epiphany bit test

Whilst working on the assembly version of the object detector code I came across the need for a bit test. Unfortunately paralella doesn't have one built in and also unfortunately there is no literal form of the AND instruction which is normally used for such things.

Fortunately however there is still a way to encode a single-bit test it into a single instruction.

CPUs always track the sign bit in the status register and its easy toput any bit there. The only question was how to extract that since there is no Bcc that maps directly to it.

Starting with the LSL instruction and it's flag updates:

  LSL <RD>, <RN>, #IMM5

    RD = RN << <OP2>
    AN = RD[31]
    AV = 0
    AC = 0    (pity AC doesn't follow the last out-shifted bit)
    if (RD[31:0]==0) { AZ=1 } else { AZ=0}

And then looking at the Bcc operation that utilise the negative flag:

  0110 Greater Than (Signed)           BGT     ~AZ & (AV ==AN)
  0111 Greater Than or Equal (Signed)  BGTE    AV == AN
  1000 Less Than (Signed)              BLT     AV !=AN
  1001 Less Than or Equal (Signed)     BLTE    AZ | (AV != AN)

Since AV is always 0 after a shift, this leads to a fairly straightforward pair of tests:

        ;; Test bit X is set
        lsl     r0,r1,#(31-X)
        blt     .bit_is_set
        bgte    .bit_is_not_set
And the same for the MOVcc instruction.

Having this just as efficient as if one had a bit-test instruction is rather more handy than if it wasn't. Bits are such a compact way to represent information it's a way to save memory and anything that saves memory is a big plus on epiphany.

The C compiler just follows the C one uses to implement bit tests:

int main(int argc, char **argv) {
        return (argc & (1<<5)) ? 6 : 9;
}
00000000 <_main>:
   0:   2403            mov r1,0x20
   2:   00da            and r0,r0,r1
   4:   00c3            mov r0,0x6
   6:   2123            mov r1,0x9
   8:   0402            moveq r0,r1
   a:   194f 0402       rts

But it can be done literally "one better":

_main:  lsl     r0,r0,#(31-5)
        mov     r0,#6
        mov     r1,#9
        movgte  r0,r1
        rts

Update: Actually a better example would be a higher bit. C falls further behind ...

int main(int argc, char **argv) {
        return (argc & (1<<21)) ? 6 : 9;
}
00000000 <_main>:
   0:   200b 0002       mov r1,0x0
   4:   240b 1002       movt r1,0x20
   8:   00da            and r0,r0,r1
   a:   00c3            mov r0,0x6
   c:   2123            mov r1,0x9
   e:   0402            moveq r0,r1
  10:   194f 0402       rts

Also note the code-size expansion from 2 bytes to 10 for the low-register source operand. Although i don't know why the compiler is using the 32-bit form of mov r1,0x0 since the 8 bit form would suffice and it would only need 8 bytes.

For high-registers this would be 4-bytes vs 12.

No comments: