Thursday, 4 September 2014

"Sentinel Saves Single Cycle Shocker!"

Whilst writing the last post I was playing with a tiny fragment to see how just testing the edge equations separate to the zbuffer loop would fare. It's a bit poor actually as even the simplest of loops will still require 9 cycles and so doesn't save anything - although one wouldn't be testing every location like this so it's pretty much irrelevant.

Irrelevant or not I did see an opportunity to save a single cycle ...

If one looks at this loop, it is performing 3 edge equations positive tests and if that fails it then has to perform a loop-bounds test.

0000015e:       orr.l     r3,r18,r19      /|                  1                                             |
00000162:       fadd.l    r18,r18,r24     \|                  1234                                          |
00000166:       add.s     r0,r0,#0x0001   /|                   1                                            |
00000168:       fadd.l    r19,r19,r25     \|                   1234                                         |
0000016c:       orr.l     r3,r3,r20       /|                    1                                           |
00000170:       fadd.l    r20,r20,r26     \|                    1234                                        |
00000174:       bgte.s    0x0000017a       |                     1                                          |
00000176:       sub.s     r2,r0,r2         |                      1                                         |
00000178:       bne.s     0x0000015e       |                       1                                        |

On in C.

  for (int x=x0; x < x1; x++) {
     if ((v0 >= 0) & (v1 >= 0) & (v2 >= 0))
       return x;
     v0 += v0x;
     v1 += v1x;
     v2 += v2x;
  }

Problem is it needs two branches and a specific comparison check.

Can a cycle be saved somehow?

No doubt, don't need Captain Obvious and the Rhetorical Brigadettes to work that one out.

Just as with the edge tests the sign bit of the loop counter can be used too: it can be combined with these so only one test-and-branch is needed in the inner loop. After the loop is finished the actual cause of loop termination can be tested separately and the required x value recovered.

It's not a sentinel it's just combining logic that needs to be uncombined and tested post-loop in a similar way to a sentinel.

000001ba:       orr.l     r3,r18,r19      /|                  1                                             |
000001be:       fadd.l    r18,r18,r24     \|                  1234                                          |
000001c2:       sub.s     r0,r0,#0x0001   /|                   1                                            |
000001c4:       fadd.l    r19,r19,r25     \|                   1234                                         |
000001c8:       orr.l     r3,r3,r20       /|                    1                                           |
000001cc:       fadd.l    r20,r20,r26     \|                    1234                                        |
000001d0:       orr.s     r1,r0,r3         |                     1                                          |
000001d2:       blt.s     0x000001ba       |                      1                                         |

Or something more or less the same in C with the pro/epilogues and probably broken edge cases:

  int ix = x1 - x0 - 1;
  while ((v0 >= 0) & (v1 >= 0) & (v2 >= 0) & (ix >= 0)) {
    ix -= 1;
    v0 += v0x;
    v1 += v1x;
    v2 += v2x;
  }
  if (ix >= 0) {
    return x0 + ix;
  }

I suppose it's more a case of "Pretty Perverse Post Pontificates Pointlessly!"

Or perhaps it's just another pointless end to a another pointless day.

Might go read till I fall asleep, hopefully it doesn't take long.

No comments: