Tuesday, 29 May 2012

ARGH drivers! Code!

So, the AMD HD7970 arrived at last.


I knew it was probably not a wise idea just yet ... but i put it in ...

The 12.4 drivers weren't too much hassle to get going on Fedora 15, I just downloaded the Fedora 16 SRPM and built that. Had a couple of library path issues/conflicts with the SDK but I got that sorted.

The "second monitor is slow as shit" problem is back again unfortunately. I had that with some other previous version, then it went away with 11.11, but now it's back again. Very annoying but not the end of the world.

Unfortunately though, lots of code crashes, even crashes the driver and system lock-ups. I haven't had that happen for a while :'(. But mostly this seems to be bugs in my code. e.g. I spotted an issue with the SURF hessian det routine - it expects the images to be multiple of 16 wide for instance otherwise it can end up over-writing the bounds of images. I guess GCN or the driver is a lot thingier about such things now.

Also a compiler crash, but i suspect that is some bad barrier() usage in my new SVD stuff - i always get caught out when trying to have sub-groups of work-items work together on a problem and i just noticed the outer loops wont be synchronised across all work-items. That's a bit of a bummer as fixing it might be a bit of a performance hit (i'll see if simply removing the barriers works first! For a CPU backend the design is wrong anyway), but it doesn't matter how fast it runs, invalid code is useless.

Ho hum. Well I knew I should've waited ... for now i've put back the old card and will have a look when i have more time next week. Thankfully everything still works even with the updated driver - i very much doubt the cayman code-base is getting much love with the GCN stuff around so it probably is the same code as in the older driver.

Sunday, 27 May 2012


Over the weekend I ported the clsurf code to socles. I had a bit of a play with a few other things as well although they didn't make much of a difference to the execution time (well, maybe 10% maximum kernel time). I guess one thing to try will be to change it so that it doesn't need to communicate with the host at all: it should be possible with some 'persistent' kernels and forcing a hard-limit on the feature point count.

I put a few more details over on the project page.

Friday, 25 May 2012

oooh fast

Well I had a frustrating yet productive day today following on from yesterday's progress.

I ported the Jacobi SVD code to OpenCL and managed to paralleise most of it. I'm using 9 threads per SVD column which lets me paralleise all of the rotations. The code itself is somewhat simpler than the LU decomposition I worked on earlier, and it parallelises more easily. Actually having 9 threads also allows me to implement a 3x3 matrix multiply in 4 lines of code as well so it all fits nicely. I'm using 63 threads per local work group so the few times I do need barriers are just compiled out anyway. I also removed some of the code I don't need: e.g. fixing up the singular values and the U matrix at the end. The only bit which isn't totally parallel is the error calculation of the columns, but even there I was able to utilise a parallel sum to help.

I'm not really sure of the numerical stability; it works for my test cases.

So yeah, it's pretty fast - I'm currently doing batch sizes of 2000 homographies, and the SVD calculation and generation of the homography matrix is taking about 3.5-4ms - somewhat better than the 200ms or so from the CUDA RANSAC code (although that does a bit more work as well). I'm still doing the randomising and setup of the input matrix on the CPU, but that's the next candidate to move to the GPU. The GPU code to check for inliers is in the range of a 140uS as well. Once I re-arrange the code a little I will be able to keep all the data on the GPU - e.g. from SURF generation all the way through to final homography matrix.

I had thought real-time was going to be impossible, but now i'm not so sure.

The 3x3 matrix multiply fell out very simply - having 9 threads already assigned to each column of the SVD meant I had 1 thread per 3x3 matrix cell as well. I haven't had to look too much into matrix stuff on the GPU so far, so it was satisfying that such a neat solution presented itself when I did.

void mult3x3_gl(uint lid, uint cid, uint ar, uint ac,
    global float *a11, local float *b11, local float *lwork, local float *r11) {
 lwork[lid + LSTRIDE * 0] = a11[lid - cid + ac + 0] * b11[lid];
 lwork[lid + LSTRIDE * 1] = a11[lid - cid + ac + 3] * b11[lid];
 lwork[lid + LSTRIDE * 2] = a11[lid - cid + ac + 6] * b11[lid];
 r11[lid] = lwork[lid - ac + LSTRIDE * ac + 0]
  + lwork[lid - ac + LSTRIDE * ac + 1]
  + lwork[lid - ac + LSTRIDE * ac + 2];
Here lid = local id, ac = lid % 3, cid = lid % 9 (base of matrix for this work item), LSTRIDE=64. With a worksize of 63 it means this does 7 3x3 multiplies at once and the barriers vanish.

It's nice when everything totally parallelises and there are no special cases.

The most frustrating part in all of this was getting the addressing right. For the SVD I came up with a complex addressing scheme where 9x9 matrices are stored in groups of 7 side-by-side. A few more grey hairs and a hoarse voice over trying to get that to work.

Also spotted a strange problem with the AMD driver: If I use printf, then I cannot use a cached binary or it simply crashes. If I just add a space to the code to force a re-compile (my cache uses the whole source as the key) it works ok.

Thursday, 24 May 2012


With the prototype out of the way I'm back to looking at algorithms again, at least for a couple of weeks. It seemed a good opportunity to finally try and nut-out RANSAC for building homographies for mapping between images. I've previously looked into it but got no-where because it was all totally matlabotomised code and difficult to decipher.

I thought i'd start with this CUDA RANSAC implementation since at least it had all the right structure in place.

Of course, then i needed to get SURF going ... which I had previously played with (half-heartedly) and failed, but now with clsurf around, I just ported that to JOCL (which was thankfully a simple task as the code structure is similar to the way I write stuff). And that pretty much 'just worked', so all good so far.

Porting the CUDA RANSAC code wasn't much work, but I wasn't getting great results, nor great preformance so I ported that to Java to gain a deeper understanding. After a bit of searching I found EJML which I used for the SVD this time just to make sure.

Another implementation ...

I still wasn't grokking it - although by now RANSAC itself was starting to make some sense - so I kept searching. I found this interesting homework solution which had a complete, readable, and well explained RANSAC algorithm. At last. So finally i was getting reliable results - a bit slow mind you - but reliable if I let it run long enough.

One issue I had is that I seemed to need a great number of iterations to get a quality result - several 10s of thousands. i.e. a bit slow. Something like 20s for one of the data sets I had. I found out that much of the time was taken up with in-lier check, which is quite simple and although it processes many points should've been faster. It seems the E of efficient in EJML doesn't quite stretch to small matrices. Hand-coding that, and simplifying the test to only check the forward transform sped it up considerably. With a bit more tweaking the test-case is down to around 1s - for single-threaded java code.

I also found this site which has a bunch of matlab scripts and one is for a more complex version of the homography calculator which uses less FLOPs (I started with the paper 'Computation of Homographies' by Harker and O'Leary - but for whatever reason the site I downloaded it from seems to have removed it since). That seemed worth looking into, so I created an implementation of that as well. It didn't seem much faster at first but on further investigation it seemed worth looking into. It gets most of it's speed from running two smaller SVD's (4x2 and 8x3) rather than one larger one (8x9).

I was thinking at this point I might just forget about the SVD stuff on a GPU - i can find other work for it to do and the SVD code I was looking at (in EJML) is very complex. But with the fixes to the inlier check now it was taking about 50% of the time; which meant that even with multi-threading I probably couldn't keep the GPU very busy with in-lier checks. My first attempt at this generated a single result on the CPU then double-buffered the matrices to the GPU for it to perform the inlier check. Unfortunately the driver overhead was too great; I got it down to about 4s (for 10K iterations), but most of that time was in the driver synchronisation. After thinking about it a bit more I was going to do a whole batch instead, but then I did some more profiling and realised it was only going to give me at best a 2x performance boost, so I thought i'd go for some higher-hanging fruit first.

So I revisited the thought of creating a fully-GPU implementation. So as it happens I ended up back with the GSL version of the SVD code which was in the CUDA_RANSAC stuff. I dropped that back to Java and got it working. I noticed the code almost always does operations on columns, and indeed, copies the columns around a couple of times just for good measure. So I transposed the data and operate on rows instead: which means I can parallelise some of the internal loops as well. With a few more tweaks this is now the fastest version, 50K iterations take about 0.85s.

I'm still working on implementing a version of the 'optimised' homography algorithm at the moment, as it needs some more work - e.g. an implementation of pinv. But given I have the basic version working I should probably try to port that to OpenCL first, to see if I will get much benefit.

I probably need to investigate trying to reduce the iterations required. At the moment I am doing a simple nearest-neighbour match using an L2 measure, so this could probably be improved (but given results as below, it is doing a fair job).

This is a display of the inliers. There are 249 inliers from 571 putative matching feature points.

This is the result after running clsurf to find feature points, running a sum of squares match against the feature points to find the putative matches, and then running RANSAC for 50K iterations to find the homography.

SURF and the nearest neighbour match are running on the GPU (in ~50ms total) with the RANSAC entirely on the CPU in single-threaded java - in about 0.95s for the 'unoptimised' homography calculator.

This is the image after the homography is applied to each corresponding picture. i.e. the source right-hand image has the inverse homography applied to generate the output left-hand image. And visa versa.

Wednesday, 23 May 2012

lalalala i can't hear you lalalala

I came across a pretty bizarre bit of information on some opencl forums yesterday and the more I think about it the more it explains quite a few things about how just strangely m$ evangelists and fanbois behave. It's not a new thought, but a timely reminder and it got me thinking further about it.

It was a question regarding some building options when using one of their developer products: presumably by someone who has a paid-for or otherwise valid license for this product. Yet this is what their response was:

BTW, I originally posted this under Microsoft‘s Visual Studio forum and had it modded off-topic and (re)moved: “Moved by DanielMoth Microsoft Employee 1 hour 38 minutes ago We do not support OpenCL questions in the Microsoft forums. (From:Parallel Computing in C++ and Native Code)“

Presumably this is only because OpenCL is not a m$ endorsed platform, but I think there is a bit more to it than just that. I think m$ are shit-scared that anyone in their protected information sphere might be made aware that there are practical alternatives to the products they push. Or worse - that anyone might actually be using them. One always notices that they don't seem to be aware that there is a whole world of developers who don't need their snot to get real work done - and might be more productive without it. Obviously this is intentional. Pretty much on-par with how the main-stream-media continually pushes a message controlled by the monied classes to ensure people behave as they want.

In my recent work with Android I had a co-worker join me on the project, but unfortunately he is a big-time m$ fanboi. He even has a m$ phone 7, uses bing as his only search engine - on IE (I mean, ... "really"?), has never rarely used anything but microsoft IDEs - since circa 1990, etc. The thing is, he was totally lost outside of that environment, and any failings of the android sdk or documentation (of which there are plenty) prompted comments on how wonderful c-hash and whatever-wpf-is-called-this-month is. Forgetting that all toolkits pretty much suck and are a huge pain to work out when starting from scratch. I spent ~3 years on c-hash bullshit because of this guy, so i know he's so full of shit when he claims otherwise - and call him out on it.

But what most surprised me is that he didn't get a google account to access the shop, nor did he download a SINGLE android tablet application to find out how they work, and what sort of user-interface features work well vs which ones don't. I don't think he even looked at the bundled ones when I wasn't driving the machine.

Why would one intentionally handi-cap your ability to do a good job? Strange ... and I don't think it was to intentionally make the result look bad - it was just too much of a culture shock for him to be outside of his protective bubble. This is why we wasted a few weeks on an aborted web front-end in the first place as well.

Anyway, I think this childish insular attitude helps to explain much of the weirdness with m$ phone and nokia's scandalous fall from the market driven by a CEO of this mould. I've been reading a lot of Tomi Ahonen's blog of late, and some of the information about m$ phone shows they are being so insular in their rejection of all possible alternatives that it is causing them to make many mistakes and create a shit product that nobody wants.

BTW the fact that Tomi had high PR reps from Nokia trying to disparage him on twatter and even on his blog shows they have totally lost the plot. Totally insane.

Tuesday, 22 May 2012

Ubuntu is still snot.

Some business was throwing out a couple of old laptops and my flatmate ended up with one: which means I get to try to make it work. After waiting about 8 hours for it to book the ubuntu live cd a couple of times I thought it wasn't going to be much use: until I opened the SODIMM cover and saw the 512MB module had fallen out. At least 768MB makes it almost usable ...

But yeah so I ended up installing xubuntu 12.04 (I didn't just go with fedora because that's a pain if you're using XFCE as well, not to mention all that systemd and other crap). I can't say I have much good to say about it on first impressions:

  • The default theme is whack. It's hard to tell if a window is focused or not, and there's no scroll up/down buttons which makes it a pain to use. Nice to finally see someone realise black on white writing sucks though, and at least the terminals are grey. I kinda wish XFCE had window title colours other than "windows 95 blue" though (all but one).
  • Way too much shit is attached to the touchpad 'scroll area'. WTF would you ever want to change desktops this way, or shade your windows?
  • The "ubuntu software centre" is a complete joke. Ugly, slow, hard to use. My first impression was "must be some python crap"; confirmed. I only ran it accidentally as I was looking for the real thing, i.e. synaptic. Ok, sure if it was a few weeks work on a prototype, but if this is supposed to be a production-ready bit of code it is a total embarrassment.
  • Notwork-Manager still ... doesn't work properly. I gave up on the on-board wireless which refuses to connect for whatever reason, but I have a very old PCMCIA card as well. I got that working easily by running a few commands manually but Notwork-Manager just kept retrying and failing, retrying and failing ...

All it really needs to do is play videos and visit youtube, so i'm not asking for much. If I can get it that far along I will leave it.

Friday, 18 May 2012

Citrus Day

Handed off the prototype i've been working on for the last 8 weeks to the project manager last night; fairly pleased with the result in the end. But given it was a bit of a slog and I will switch gears next week back onto algorithms (yay, and OpenCL), today seemed like a good day for a break ... and to catch up on some preserving.

The citrus has all gone crazy this year.

And this is after I've picked about 40 kaffir limes (actually, they fell off), and the bowl of kumquats in the next photo. I juiced the kaffir limes last week and ended up with about 700ml (and a whole lot of rind in the freezer), although about all i've found to do with the juice is as a scalp/hair treatment. The lemon tree in particular has been so loaded the tree itself is in fear of falling apart. But they're not quite ready yet - they're only meyers anyway, so are fairly sweet - I've already tried some by peeling them, separating the segments and sprinkling with sugar. Tastes pretty close to grapefruit actually, although not so bitter and thus a bit nicer.

The mandarine tree still has plenty on it too - we've been eating them every day and giving away bags of them but you'd barely tell looking at the tree. Most have them have been nice and tart and juicy too.

So what to do with these - they surprisingly go a long way for such a small fruit. I had previously done some brandied kumquats which a friend particularly liked, so I thought i'd start with that - and give it to him as a birthday present.

I used a recipe I found some time ago but haven't been able to re-find: equal parts by weight of kumquats, sugar, and brandy. This is a 2800ml jar, so i've got about 900g kumquats, 900g sugar, and 900ml of brandy. I got some cheap glass jars at a $2 shop, and they worked pretty well apart from some pretty cruddy seals (but they'll do). The sugar will take a few days to fully dissolve. After about 6 months you get something that takes pretty much like a citrus version of a tawny port - obviously quite sweet but with a fairly balanced flavour of tart & bitter as well.

So I still had a few left over from the initial picking so I scoured the net for some other ideas - last time I had a decent crop I made some marmalade, but I still have plenty of that left! (from 4 or 5 years ago) I thought i'd try preserving them like lemons, so a good cup-or-two of salt later, a bit of lemon and lime juice, and some stinging fingers and there you have it. I really have no idea if these will work at all, but apparently they're good for something after a few months. Based on something approaching this recipe (although "some salt" is hardly a useful fucking measure ...) together with this one for lemons from The Cook and The Chef. Who knows if i have too much salt, but if nothing else it looks nice.

And finally my lime tree has been dropping limes enough that cooking can't keep up with them, so I wanted to make some lime cordial ('syrup' for you yanks). Again this is something that goes a long way, with 7 limes I had 2l of cordial (I usually mix it about 6:1), but it's a really nice summer drink. Pity it's a long way from summer. As there are more ready to drop I will probably make ice-cubes of it next time. I used this recipe although I only used 1kg of sugar, and bottled it whilst hot without straining. The old lady had a great recipe from her Green and Gold recipe book but she threw the book away years ago ...

And yes, it's yellow, but it's still all lime juice, which was a nice shade of green going in. Although I added a bit of lemon zest, and most of the lime zest was also fairly yellow as the limes are proper-ripe. I normally leave all the zest in as the sugar more than makes up for any bitterness they might add, and it looks nice.

I also have a very small number of west-indian limes - the very small ones with seeds in them - so i'm not sure what to do with them. They are super-tart though.

Bombay Sapphire bottles with the labels removed make very nice storage bottles too. Must get some more since I have all these limes handy (for the gin that is ;-), although it's been a bit of a dangerous drink for me lately (it's just too easy to sink them down).

Monday, 14 May 2012

driver bugs?

Over the weekend and Friday I spent a good 3 days worth of full-time hours trying to debug the player in jjmpeg (and still did nothing Sunday).

At least along the way I discovered the profiler, overheads of using a BlockingQueue(), memcheck in the emulator, enabling GLES2 in the emulator, hooked up the decode_audio4 api, and a few other odds and sods.

But, the more I look, the more it looks like i'm hitting a bug in the driver. Given how new the tegra3 stuff is, and given it's nvidia, it wouldn't surprise me. I still consider labelling it a driver bug a 'last resort', since it's such a basic feature, but i'm running out of other possibilities here.

I've tried java side or c side glTexSubImage2D(), i've tried copying the AVFrames, or copying the AVPlane's to pre-allocated or dynamically allocated bytebuffers, i've tried using a single AVFrame for decoding or a cycling buffer of them. Although sometimes one mechanism seems a bit more reliable (and i even managed a full run-through the hour-ish video i've been using to test) eventually they all crash. If all I do is remove the texture loading (glTexImage2D and glTexSubImage2D calls); then it doesn't.

This has been just a huge waste of time now i'm getting pissed off with it.

I could try writing a simpler bit of code to isolate the driver code, but given it's crashing inside FFmpeg it probably wouldn't provide enough logic to cause problems if I did that. Which probably means returning to software colour conversion and resorting to a Bitmap as the image surface ...

I wrote the above Saturday morning but kept poking - now i'm not so sure. Maybe it's something to do with the audio decode. Early on in the piece I had a packet-checker checking that the packets were padded properly - and it showed they weren't. But then I moved to the decode_audio4() api and haven't re-checked. Still, using gles2 with no audio stuff at all still causes problems as well, so it can't be just that. And the solid valgrind results from the emulator show that it's not suffering from memory nastiness as I originally thought. Assuming they can be trusted.

Sunday, 13 May 2012

Maker? WTF is a maker?

By happenstance - as I just don't bother reading the news much these days much less specific papers - I came across this interesting article about the 'hacker culture' this morning in the age.

First i've ever heard hackers wanting to be known as 'makers'. Sounds a bit faffy and overly-pc if you ask me. I had seen 'maker' sites, but I thought that was just a brand-name, and not a movement ...

I have no problem calling myself a hacker, but then again it's not like i'm talking to anyone but myself ;-)

George finishes with the question: could Australia develop it's own 'Silicon Valley'? And I think the answer is simple: No, no it couldn't.


First, there is the education system. Computers are now just mechanisms for running software from Microsoft, Apple, Adobe, or other large US companies. One doesn't start with an 'introduction to computers' covering 'word processing, spreadsheeds, and programming', all you get is: Microsoft Excel, Microsoft Word. If one progresses ever past that point it is merely to 'Photoshop', or maybe avoiding stalkers on the intertubes.

In primary school we had one Commodore 64 in the whole school and as a `bonus' for finishing class work early we sometimes got to type in computer programmes (often from Compute!, which used it's own checksummed number entry system - so it was pages of hexadecimal numbers). High school had a pool of crappy Apple ]['s, but even then word-processing, 'desktop publishing' and spread-sheets were only a small part of the course, which ended with programming. And this was in year 8.

Gone are the days when education was about educating people, now it's just low-level training for stuff anyone who will ever work in the field has already done before they start school: using a mouse, using a crappy 'word processor' (which sadly, is still just as crap as they were back then: i.e. a fancy type-writer), and how to run an internet search.

Hackers of course have always gone beyond the education system, but if you're repeatedly told that computer software systems are locked up private applications which no user serviceable parts, it has to have some impact.

Also this notion of training for specific products that 'industry wants' is utter nonsense. First, you will never cover enough in high school (or even TAFE) to avoid on-the-job training specific to the business needs. And second you are always much better off being exposed to as many different systems as possible. The human brain is an incredible organ, and even more-so during developmental years: like any adaptive system, the more you expose it to, the stronger and more robust it becomes. Intentionally crippling its development by limiting it's exposure to alternatives borders on criminal.


Systems are so much more complex than they were back when Silicon Valley got it's start. Not only to write code for, but expectations are much higher.

Back in the day I could spend a whole weekend typing in a bit of BASIC to get an animated lo-res horse-sprite running across the screen and I was impressed enough just with that. And even then; the actual code required just wasn't that much and you were talking directly to well designed simple hardware. You literally could design a whole computer itself on the back of a napkin, let alone the software.

Today you need a bulky IDE to get very far, to learn some gigantic and usually poorly documented toolkit, which goes through another toolkit, through an operating system, then a driver, and if you're lucky to the screen. And you don't feel satisfied until you've come up with the next billionaire-making 'killer app'. And to make matters worse the hardware is devilishly complex; so complex it never really gets working properly and needs to be hidden behind layers of drivers and high level apis - which it necessarily must be for an operating system to function - but makes it that much more challenging to understand what is going on (which as a hacker: you need to know sometimes).

The barrier to entry is huge, and for all intents beyond the reach of most individuals. This leads them to believe there's no point in trying, and creating software is exclusively for the great wizards who doth do such things (and must be paid accordingly).

This complexity affects every level: so beyond the individual then you have large complex systems which take many man-years to create.

Silicon Valley also had a unique combination of hardware and software mixed together allowing both to leap ahead in bounds together - something which would only be possible in China or Taiwan at the moment. But they're not interested in the software or the hardware design, just in making throw-away junk cheaper than everyone else.

Fucked up Legal system

And then we get to patents and so on. It is now impossible for anyone to create any piece of useful hardware or software without stomping on someone's stupid patent in the USA. Which means effectively everywhere in the world as all of our local politicians have kowtowed to the almighty US dollar and wont dare to argue for their or our own interests.

It would simply be impossible for Silicon Valley to become Silicon Valley in today's legal environment. The whole notion was based on a solid foundation of sharing of ideas and people, and a free flow of money. This latter point alone rules out Australia ever being more than a bit-player in the game, things just aren't done that way here.

The USA wouldn't let us

So this is related to the previous point, but the legal system is just one tool they use for this.

The USA is the dominant military, political, cultural, and financial force of the day. They do not let anything get in the way of that dominance. Anyone who thinks otherwise just hasn't been paying attention.

e.g. The H1B visa programme in the USA isn't about US companies not being able to get enough talent at home; it's about preventing any of the talent from being used by competitors.

If Australia ever had a Silicon Valley it would only be at the divine blessing of the USA. It would almost certainly be in the form of a 'technological partnership' with an IBM, or a university. And these are just talent syphoning systems.

Pretty unlikely for that matter - just not enough people, too high wages, that's why they've set-up shop in India and elsewhere.

The tyranny of distance, money, etc

Australia is just too big and too thinly populated to create a high enough density of a specific type of people needed for such a thing to occur. Not to mention too parochial - although a lot of people would travel interstate for work, plenty wouldn't.

The NBN has the potential to remove the distance factor, but even if that ever gets done properly - it just doesn't create enough connectedness for it to work effectively. If people aren't excitedly talking about their latest idea in the pub with competitors the ideas just aren't sharing quickly enough (this will be silicon valley's downfall - already a lot of the exiting stuff is happening elsewhere).

And you need all that excess money to throw at random ideas as well. Which basically means you have some source of free money - imperialism, oil, the world's trading currency, or somesuch.

We will never have that here. And even the great source of free money here at the moment - mining - only seems to produce overweight greedy arseholes (an unusually number of whom are remarkably fugly to boot) who have no interests beyond their own fat arses.

Friday, 11 May 2012

The World Wide Web

So M$ are up to their old tricks again and restricting competitors from accessing their operating systems properly. Nothing ever changes apart from the date. Does it really matter much anyway? (who in their right mind is even going to use that shit to start with?). Mozilla are still pushing their it's-really-an-operating-system angle, but is that really ideal for me as a customer? Or even as a developer for that matter. One only has to look at any application on a tablet and compare it to it's browser counterpart. Even running on a monster desktop workstation the tablet usually beats the pants of it. e.g. google maps or youtube. For all it's "lowly" specifications (well, they aren't really), the tablet i've been using shits all over anything i've ever seen in a web browser, HTML5, or Flash, on much much faster machines. I would really like to see browsers eschew all that heavy-client crap which has turned them into bloated power wasters, and return to a world of delivering information in an open and linked way. This is something they actually do quite well, and then they can leave the heavier stuff to native applications which always do a better job. And with the range of devices exploding - from workstations to tv's to pocket computers - doing anything more than text with a few pictures, the odd form, is only going to become more difficult. The right tool for the right job, and all that.

Abject failure

I spent an inordinate amount of time last night trying to lick the decoder stability issues.

I ported the jjmpegdemo MediaPlayer over to android, and re-arranged it so it was shipping AVPacket's around instead of AVFrames and decoding on a separate thread (this is how ffplay does it). I changed the rendering to copy to a texture synchronously with the decoding (i figured sharing EGL context work can wait). I used multiple textures to implement some buffering.

So when I finally got it running (after wasting a good hour on a silly place-holder mistake when i started), it still suffers from the same problem. Along the way netbeans decided it had too many projects open, and the debug cycle time on the tablet seemed to get longer and longer.

Ho hum.

This morning I poked at using shared contexts, and although it seems to use less CPU time, it still crashes and it's doing something weird with the frame rendering order as well.

So yeah, might have to sit on this for a bit and brood.

Update: As one does ... I persevered. Current guess is that it's something to do with the texture load: if I isolate that out things seem to run fine (but with no repeatability this is hard to judge). More perseverance required.

Saturday, 5 May 2012

Speaker-busting screech

So yeah, don't forget to set your endian-ness on your direct ByteBuffers when you're playing with sound files. Poor speaker. This morning I got sound going with jjmpeg/android. This was mostly fixing up the way I was using AVPacket's - I removed the the AVAudioPacket class and added the functionality to AVPacket. I need to look at the new audio_decode_4 api, as that might simplify the usage anyway and remove the need for the special case. I also hit some surround-sound files - i've just never played with audio much (don't even have speakers on my pc) - so have some more api to think about binding. Shouldn't be too hard to fit this into the video player, AudioTrack reports it's position which is enough.

Friday, 4 May 2012


So yesterday I checked in the work I did last week for the android jjmpeg port. Not heavily tested, but it appears to work.

Then I spent far too long poking around trying to find out about the EGL OES_external_images extension, but finally came to the conclusion that it just isn't exposed publicly (yet?) so it's way more effort than it's worth to me to get it going.

So I suppose it's just glTexSubImage2D for me then.


I had a long work week this week and although I had intended to get a couple more things out of the way this morning I let it slide, and eventually got sucked in to playing with jjmpeg. I rigged up a version using GL, but still for now performing CPU-side colour conversion. I at least added a separate thread to do the colour conversion, and hooked it all up so that nothing needs to wait for anything else unless absolutely necessary.

The display stuff is a little bit faster - but nothing to worry about TBH. The colour conversion is only about 20% of the time, so it isn't a real big difference either.

I also tried with/without VFP: which made stuff all difference too, barely 5%. I started with 32-bit RGB textures simply because I had that already, and changed those to RGB565 which made another small difference. But all these small differences just weren't adding up to anything but a lot of small differences.

So anyway ... I went and looked at some of the options for skipping frames, multi-threaded decoding and what have you, and I noticed that threading wasn't enabled in the build.

Cut a long story short - beware of copies of copies of scripts one finds on the net: my compile script was basically shit. I also turned on NEON this time ...


I'm not sure if it's just the NEON, or the threads - or simply compiling it properly - but boy what a difference.

(actually it's all those not-so-little-bits now adding up, even single-threaded it's now much faster, in performance mode 2 threads can nearly handle 720p).

So it's now decoding 720p MP4 fine (taken from the on-board camera), even in 'balanced power saving' mode. Before it was struggling with this, under 10fps. And now the colour conversion is more like 50% of the time, so I will have to investigate using GLES for this since it should be a lot better at it.

Anyway, i'm quite chuffed at this now - I was starting to think it was pretty much pointless apart from perhaps encoding or more control over decoding. But this level of performance opens up a lot of possibilities. I'm also still only using ffmpeg 0.10.0 release, so there might be more there now too.

Update: So I kept going and added the GL colour conversion. Quite a bit better and 720p is fine with 2 threads, but not able to handle 1080P from the built-in camera.