if
statement of 2900 lines.I didn't think it would be something that would map to OpenCL particularly well, but I was pleasantly surprised.
I simply took the if statement and wrapped it in a kernel which I call for every pixel, and added a simple list append function at the end (more on that below). With a bit of playing with the kernel work size I got it down to about 75uS for a 1024x768 frame at around 1000 output points.
I still haven't done non-maximum suppression or the like but it certainly lives up to it's name - it's damn FAST. I've been playing with SURF and others and even a partial implementation is licking 2000uS/frame. FAST seems to be very sensitive to noise and camera focus though, so i'm not sure I can use it - hopefully the non-maximum suppression will help.
GPU List Append
One problem with GPU coding is that it particularly likes having large well-defined data-sets to work with, and what I needed to do was just generate points beyond a threshold. In the past i've just had a separate post-process which 'reduces' the data, but that input had already been reduced and wasn't just a whole frame's-worth.So I came up with something very simple based on atomics. I don't know whether it's the best solution but it seems to work ok in this case.
kernel void somekernel(..., global uint *indexp, global float *posp) {
// do stuff
if (result > threshold) {
uint index = atom_inc(&indexp[0]);
if (index < 1024) {
posp[index] = (float2) { x, y };
}
}
}
Anything that then uses the 'index' count just has to limit it to the maximum (e.g. 1024) and away it goes.
No comments:
Post a Comment