I went about coding up that atomic counter idea from the previous post but had a hard time getting it to work. After giving up for a bit I had another go (couldn't let it rest as it was so close) and I discovered the register-saving preamble i'd copied from the async dma isr was simply broken.
_some_isr: sub sp,sp,#24 strd r0,[sp] strd r2,[sp,#1] strd r4,[sp,#2] movfs r5,status ... routine movts status,r5 ldrd r4,[sp,#2] ldrd r2,[sp,#1] ldrd r0,[sp],#3 rti
Well it certainly looked the part, but it clobbers the status register in the first instruction. Oops.
strd r4,[sp,#-1] movfs r5,status sub sp,sp,#24 strd r0,[sp] strd r2,[sp,#1]
There goes the symmetry. Actually it could probably just leave the stack pointer where it is because the routine isn't allowing other interrupts to run at the same time.
Fixed, the atomic counter worked ok. It shows some ... interesting ... behaviour when things get busy. It's quite good at being fair to all the cores except the core on which the atomic counter resides. Actually it gets so busy that the core has no time left to run any other code. The interrupt routine is around 200 cycles for 16 possible clients.
So ... I might have to try a mutex implementation and do some profiling. Under a lot of contention it may still be better due to the fixed bandwidth requirements but it's probably worth finding out where the crossover point is and how each scales. A mutex implementation wont support host access either ... always trade-offs. Definitely a candidate for FPGA I guess.