Thursday, 16 December 2010

Branchy code

This week I was looking at feature detectors - and one of those I was trying is FAST. This is pretty much the definition of branchy code - a function which is a single if statement of 2900 lines.

I didn't think it would be something that would map to OpenCL particularly well, but I was pleasantly surprised.

I simply took the if statement and wrapped it in a kernel which I call for every pixel, and added a simple list append function at the end (more on that below). With a bit of playing with the kernel work size I got it down to about 75uS for a 1024x768 frame at around 1000 output points.

I still haven't done non-maximum suppression or the like but it certainly lives up to it's name - it's damn FAST. I've been playing with SURF and others and even a partial implementation is licking 2000uS/frame. FAST seems to be very sensitive to noise and camera focus though, so i'm not sure I can use it - hopefully the non-maximum suppression will help.

GPU List Append

One problem with GPU coding is that it particularly likes having large well-defined data-sets to work with, and what I needed to do was just generate points beyond a threshold. In the past i've just had a separate post-process which 'reduces' the data, but that input had already been reduced and wasn't just a whole frame's-worth.

So I came up with something very simple based on atomics. I don't know whether it's the best solution but it seems to work ok in this case.
kernel void somekernel(..., global uint *indexp, global float *posp) {

// do stuff

if (result > threshold) {
uint index = atom_inc(&indexp[0]);

if (index < 1024) {
posp[index] = (float2) { x, y };
Anything that then uses the 'index' count just has to limit it to the maximum (e.g. 1024) and away it goes.


As of about an hour ago i'm now on leave until next year. Yay. I've been hanging out for it for the last few days and it's been pretty difficult keeping the motivation level up (despite working on some interesting stuff). Yard should keep me busy and i'll probably get into some hacking before long too. Maybe i'll finally play GT5 and 'upgrade' the PS3 firmware. But for the rest of the day it's time for SFA. And maybe a brewski or twoski. I've got one more wort busy in the cellar, and once i've bottled that next week i'll have over 100 longnecks - and even if that isn't enough for the whole summer it should be a good start.

Saturday, 11 December 2010



When I get a spare time and a bit of inspiration I hack bits and pieces and work on making this usable for what I need it for. I'm using it now for screenshots - i really like the save requesters and simplicity of the interface. I added a 'resize image' function which works fine, but I was having trouble with getting the scrolled pane to recognise it's content had changed size. I have something that mostly works but I got sick of fighting with it so left it for the moment.

I also thought about the tool mechanics - the affine tool doesn't really work as a per-application tool, and i'm thinking of making it, and all of the tools per-window instead. I don't have a good feel as to whether this is a good, bad, or insignificant move. At the least it might help me clean up some messy input routing - although there's a few hours of menial work getting that sorted out.


Not much progress here, still thinking of what to do. I setup a webcam input using V4L4J - although i'm not terribly happy with the way it passes data around (it takes 3 copies to get it onto the graphics card). I also started looking into some OpenGL 3.x stuff (no fixed function pipeline) but my heart wasn't really in it and I didn't get very far - although for the most part it looks pretty straightforward. Thinking I will just try to get a couple of video input streams going and work on some simple cross fading and swipe functions to start with and that will let me play with ideas such as timebase correction and chroma keying and the like.

But I just haven't gone anywhere near sound - if sound on linux is fucked, it's even more fucked in java. That will be a hassle.


Well good to see at last Rudd stating the obvious - the leaks are the ultimately the yanks fault for not being able to handle security properly. Although I really just think he's playing the astute politician here - partly blow back against the US diplomats for being a bit blunt about him. And also I would suggest to play up to the local feel and to contrast with the totally nonsense statements from the PM and the AG. But she deserves all she gets in that regard, what an utter idiot. Still, his statements are the only clear ones by a member of the ruling party that they might have another purpose of being in power other than serving the interests of the USA.

Booze n Fat

Well the experiment continues, although i'm drinking in moderation again (work bbq today aside!), and eating in moderation too. I was surprised at how quickly I got used to eating basically half or less of what I was eating before - and feeling less hungry than I had been. My scales are a bit crap and it varies so much depending on when one weighs oneself but i'm shedding in the order of a kilogramme a week. Mood has been all over the shop though - and when I do drink a bit it gets pretty low - which is another reason to slow it down. And sleep hasn't been great either. Although I think in general although i'm getting less sleep i'm normally a bit more refreshed - but that's so subjective it's almost impossible to tell.


I have a good batch of leave coming up so i'm starting to think about the yard. I should be doing the shed but i'm putting that off ... but in the meantime I have some garden beds, lawn, and paving to work out. The more I think about it the more vegetable growing areas I want - they are just so much more useful than grass or flowers. Although I'm also aiming to get another citrus in - a tart lime like an african lime would be my choice. I have one decent spot left for that but i'm growing a tomato there at the moment ;-)

I had a good think about it yesterday and think i've settled on a plan - need to move a few tons of dirt and a pile of metal out the way to lay the wall foundations though (which is why I ended up hacking yesterday rather than going further). My chilli plants are going pretty well (already 10cm fruit on the cayenne pepper plants), tomatoes are looking good, i'm using the mints daily in dinner and last years basil has started growing again. I also pruned most of the roses right back since they have finished flowering for now and had too many long stems which were falling over. Citrus is also sprouting like crazy - maybe i'll finally get some fruit again this year. I also planted a few seeds in pots for a bit more variety - although it's a bit late in the year to start with seeds so i'm not expecting a lot. If the rain keeps up this year though it should be a great growing season.

Tuesday, 7 December 2010

A bit wet.

Got out with an umbrella during a lull in the downpour and took a couple of shots of the road outside.

My small rainwater tank is already overflowing too - it was well under 1/2 full this morning (it's 2100L).

Well that looks like about it for a couple more hours anyway. Fortunately the forecast hail held off - at least for now.

Update: We ended up with 70mm of rain in about 18 hours, which although not unheard of isn't a typical spring storm around here.

Pretty clear cut ...

From the 1st of december, a statement which shocked me somewhat:
“I absolutely condemn the placement of this information on the WikiLeaks website,” Gillard said today. ‘It is a grossly irresponsible thing to do and an illegal thing to do.’’

From today:
But asked directly what Australian laws had been broken by either WikiLeaks or Assange, Ms Gillard said the Australian Federal Police were investigating.

"The foundation stone of it is an illegal act," Ms Gillard said today.

But the "foundation stone" was the leaking of the documents to the website, not the publishing of the cables.

"It would not happen, information would not be on WikiLeaks, if there had not been an illegal act undertaken," Ms Gillard said.

Obviously the spinmeisters told this lawyer (who should have known better) that calling things illegal without basis probably wasn't terribly wise. Now of course there will be the plausible deniability that she was only ever talking about the 'foundation stone'(?) and not any action of wikileaks ... although to suggest that from her first statement above really beggars disbelief.

Well at least the opposition - deceitful hypocrites though they are - are making some noise about this, finally spotting what an outrageous statement it was for the country's leader to make. The greens could be a bit more vocal mind you (although that may simply be the media's tendency to ignore them).

Update: The ABC has just published an open letter from more than a few influential people who aren't too happy with the bend-over behaviour of the supposedly sovereign and free democratic government of ours.

Monday, 6 December 2010

The green, green grass of home.

We've had some spring weather this year which seems more like the spring weather of 20 odd years ago than it has been of late. Warm thunderstorms, short but heavy downpours, a bit of humidity and warmth. So perhaps it wasn't just nostalgia for what it used to be like making it feel like the weather has been more dreary the last few years.

The heat and damp has sent the grass totally boonta. I mowed that 2 days ago and it's almost ready again for another haircut.

Saturday, 4 December 2010

The weekend.

Another weekend upon us. And it's a scorcher, headed for 36 today. The evap is getting a good workout and it's really nice inside. I had plans to go for a ride to a mate's near the beach and other sorts of things but after last night I don't feel up to much so I might just sit inside and do sfa. Spent the morning watering the pot plants and the garden (after a massive down-pour on Wednesday it's amazing how quickly things dry out in the heat), mowing the lawn, fertilizing, and constructing a frame to support the tomatoes. I also had the bright and in hindsight totally obvious idea of keeping the mint plants out of the full sun so they develop large soft leaves rather than the small hard ones.

And so to some thoughts of the moment ...

Just because I like to see microsoft suffer i've been watching what happened with their latest phone release ... and by what limited data is available it sounds like it was a bit of a flop. Ahh well and good. Over a few beers last night I was chatting to a mate about their weird-arse advertising campaign and I think I worked out what they were trying to do. Someone looked at some charts - one showing the total population of the world, and the other showing how many people already owned a jesus phone or alternative. Then they decided they'd try to sell to the biggest slice of the pie by making fun of the minority rather than trying to steal those customers. At best, the adverts seem to be aimed at those of us who think jesus phone users are wankers - but we're just not interested in the technology itself, so you're not going to sell anything to us (and incidentally I don't see any problem with fishing a phone out of a pissing trough - these things get dropped all the time and it's not like you're going to leave it there). Then a big section of the market will never be interested in such devices because they're simply too complex and although they might appreciate the adverts (if they understand them at all) they are not potential sales. So that leaves the other section of the market - call them the 'aspirational smart-phone' buyers - who want to `grow up' and become jesus phone owning wankers themselves one day. And you're not going to sell to them by pointing out they want to be jesus phone owning wankers. But apart from all that if people are going spend a bit of their hard-earned on a 'cool gadget' they want it to be 'cool'. And that has never been microsoft apart from a pretty small band of retards who don't know any better.

Despite efforts to shut it down, wikileaks seems to be soldering on. Good thing. The media coverage of their latest spill, particularly here in Australia where we expect things to be a bit better than the US, has been utterly appalling. Apart from a couple of alternate viewpoints published on the ABC blogging site (aka 'the drum') the media (and all TV channels) have been in one voice - roundly condemning wikileaks and running full fisted with the US state department line of how terrible it all is and how illegal it has to be. And the australian pollies have all but given up the idea of australian sovereignty with their limp wristed kow-towing to the USA, basically offering an australian citizen up for whatever the yanks want to do to him. Disgraceful gormless arseholes - I can't imagine any of the actual cables that might mention Australia could paint a worse picture of their gutlessness than they're doing by themselves on prime-time TV. At most they're probably worried that Australians might realise how irrelevant their country is in the wider scheme of things and that we don't actually 'punch above our weight' in any sense.

Oh and the whole 'wikileaks is just a CIA/mossad conspiracy' thing is exactly the sort of 'grass roots' conspiracy you'd expect the CIA to use to discredit them. It does sound plausible at first, but ... then a bit of common-sense prevails.

And apparently the fishing fleets have run out of new fisheries to plunder. That is utterly mind-blowing when one considers just how big the oceans are - they are 2/3 of the surface of the planet. Even with centuries of logging there are still areas of un-touched rainforest - which has only ever covered a fraction of the land surface.

Wednesday, 1 December 2010

OpenCL Images and Arrays

Curiosity got the better of me and I ran a bunch of tests on separable convolution filters using opencl image types compared to float types.

Not surprisingly perhaps the card seem to be designed for graphics workloads more than computational workloads.


The test is running a 31x31 separable convolution kernel over a 1024x768 image. Implemented using two passes - a horizontal and then vertical convolution.

The image version is also executed over normalised unsigned byte data as well as float data (4x channel). The array version only uses single-channel float planes.

In both cases a single thread calculates each output pixel. Timings are from the NVidia Compute Visual Profiler and the card is an NVidia GTX 480.

Array version

For the X convolution it copies the kernel and 128 elements of the source array to local memory - which is then shared amongst the 64 threads in the work unit.

For the Y convolution this makes things slower because of the way it accesses memory, so it just relies on memory coalescing for the accesses and also for the memory accesses to be interleaved with processing to hide the latency.

The code must manually handle the edges - it just clips to the boundary.

Timings: X=192μS Y=400μS Total=592μS (per plane) 1776μS (3x planes).

I also tried changing the array types to float4 and processing 4 packed planes at once. This pretty much scaled linearly - i'd expected it to scale better than linearly.

Timings: X=820μS Y=1460μS Total=2280μS (4x planes) 570μS (per plane)

Image version

The first image version was a very simple implementation that just reads pixels directly from the source image. Although the data is stored in UBYTE RGBA format it only calculates 3 channels (4 channels can be done for <10% extra time). The X and Y convolution code is more or less identical save for the direction it works in.

Timings: X=618μS Y=618μS Total=1236μS (3x channels) 1269μS (4x channels)

A pretty clear win - but this is only with octet data.

I then tried using floating point as the storage, and things weren't so rosy for the image version.

Timings: X=1824μS Y=2541μS Total=4365μS (3x channels)

So I started moving some of the optimisations required for the array version into the image version. First I just copied the kernel to local memory first in both X and Y versions. Pretty major improvement.

Timings: X=1176μS Y=2117μS Total=3293μS

And finally I added the code which copies 128 elements of the data to local memory. To do this for the Y convolution I also had to change the local work size to be 64 in Y rather than X - and this probably explains why it ran faster since it creates more work groups.

Timings: X=770μS Y=732μS Total=1502μS

What is strange though that this version is slower on the byte data. I guess the extra complication and overhead of copying stuff locally slows it down too much.

Timings: X=712μS Y=731μS Total=1444μS

And if I remove the local copy of the image data the timings improve further.

Timings: X=677μS Y=725μS Total=1402μS

But they are still behind the naive version for BYTE data.


Storing data in array buffers, with properly written code can achieve similar performance to image storage - even though they have radically different data paths and cache characteristics. Array types can process individual planes separately - but can also process vector/multi-channel types fairly easily too.

Although a trivial implementation worked well for 32-bit backed pixel types, non-byte image types require almost identical treatment to the array based implementation in order to gain good performance.

Even though it might not be the most efficient, the same code can also be executed for different image storage types - the image read/write methods just use floating point values in registers which is the most convenient for the arithmetic (and tuned for the GPU). For the array code it would require completely different code for each data type - e.g. normalising to float or using fixed point arithmetic.

In short, the NVidia GPU seems optimised for accessing data through image types. And particularly for typically screen-sized images stored in 32 bit packed format. Not so surprising for a graphics card.

It would be interesting to compare to the ATI card I have - I suspect it would be pretty much a similar result and perhaps even more so, since it doesn't have have any L1 cache for array accesses. But profiling that is somewhat more work and I can't be bothered right now. I have also yet to try it with single-channel images.

Update Actually I need to know about single-channel images so I tried that and it was a bit disappointing for BYTE data: X=593μS Y=600μS Total=1193μS, the texture cache probably stores all channels anyway and for all I know the image is being stored in memory at 32 bits per pixel. For the float data using the optimised version things are somewhat better - X=263μS Y=301μS Total=564μS. And bizarrely now the optimised version is faster for the BYTE data as well - X=242μS Y=295μS Total=537μS. Presumably this is because the smaller amount of processing isn't able to hide the memory latency but the manual caching is (and the smaller local array sizes are less of a limitation for concurrency - the minuscule local memory is the main bottleneck for optimising OpenCL).

I'm running into some memory stress for work and if the byte data were stored packed it might be a big benefit here - right now i'm using float arrays. Using images might simplify some of the code too, although it looks like the more memory heavy stuff will still need to use local memory - although at least in this example that extra work would make it run faster than array types.

Hacking stuff


Saturday was a bit cold and dreary here so I spent the day hacking on some ideas - actually I ended up spending about 10 hours hacking solidly. In short I hooked up Xuggle (lib-`ffmpeg') with OpenCL and OpenGL. Using OpenCL to do some image processing and then dumping it to the display with OpenGL. Eventually I plan to hook up other video sources such as web cameras and capture cards and fortunately this stuff is pretty easy with Linux.

There's nothing i'm doing with OpenCL that couldn't just be done with pixel shaders but I need to test the interoperability and stability of that combination, and maybe i'll eventually come up with some ideas that need more complex processing. And I've only ever done a little bit of OpenGL and certainly no shaders. I tried a gaussian blur (or 1d separable convolution), sobel filter, yuv conversion, and a simple de-interlacer which worked surprisingly well (take every long field as given, and interpolate the short field by 1/2 a pixel vertically and then double the frame-rate).

This the first time i've been working with OpenCL images as a data store as well, and that went pretty smoothly. Very different performance characteristics to using arrays, and I hope to do a little timing at some point. The gaussian blur for instance didn't make any difference if I did it vertically or horizontally - using arrays the performance is very different. Incidentally the GPU profiler appears to be working since the last release, and/or I fixed the code problems I had (threading issues) so I can profile stuff again.

And finally I wrapped the video on a sphere ... err just because. Coming from the C64 era this is just something one must try, even if it looks a bit pants.

Then I got a little excited about the prospect of a hardware/software project, creating a mixing panel and building a simple digital video effects processor. Possibly using an arduino as an I/O controller for the console. But that might be a more long-term idea and there's a few other directions I might head in.


Yesterday was a bit weird - I haven't been sleeping a lot (too hot I think, the cat, other noises) so I was tired as hell all day and even resorted to a nap around 5pm. Couldn't really concentrate on work. But then I got bored about 10pm and ended up hacking on ImageZ again into the very wee hours. Looks like today is a bit of a wreck too but maybe I can concentrate enough to get some work done.

I'm not planning to spend a lot of time on it but I am thinking of cleaning it up a bit, getting what's there working reasonably well and `releasing' it. I ran it the other day to make screen-shots and it just felt nice to use so maybe if I polish it a little i'll use it myself. My needs are pretty modest - mostly just resizing gigantic camera pictures for the web and the odd screenshot

So I added a 'resize layer' function, and a new undo type to handle that. Cleaned up a bit of dead code. And started working out how to make the 'super affine tool' apply it's changes (a little tricky). This showed up some issues with the tool being stored in the toolbar - e.g. how do you 'apply' it, and once applied what next?

Although it's been a couple of months at least since I worked on it I was surprised how quickly I got back up to speed on it and remembered various details of the code-base. I can barely remember what I did yesterday, let alone 3 months ago but somehow that doesn't seem to apply as directly to hacking.