<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-9054609445298212106</id><updated>2012-02-10T20:29:01.008+10:30</updated><category term='images'/><category term='linux'/><category term='cooking'/><category term='gsoc'/><category term='business'/><category term='horticulture'/><category term='ps3'/><category term='java'/><category term='puppybits'/><category term='os'/><category term='opencl'/><category term='politics'/><category term='graphics'/><category term='humour'/><category term='rants'/><category term='readerz'/><category term='games'/><category term='hacking'/><category term='gnu'/><category term='philosophy'/><category term='forth'/><category term='mediaz'/><category term='socles'/><category term='haiku'/><category term='jjmpeg'/><category term='kobo'/><category term='biographical'/><category term='gloat'/><category term='house'/><category term='videoz'/><category term='pdfz'/><category term='beagle'/><category term='imagez'/><title type='text'>A Hacker's Craic</title><subtitle type='html'>Good and bad craic about Atheism, BeagleBoard, C, Free Software, GNU, Java, OpenCL, and the sorry life of yet another jaded software engineer.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default?start-index=101&amp;max-results=100'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>286</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-3930222879838805689</id><published>2012-02-10T07:28:00.003+10:30</published><updated>2012-02-10T14:08:54.734+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Tuning ...</title><content type='html'>Had a poke at some performance tuning of jjmpeg.&lt;br /&gt;&lt;br /&gt;I took 2 videos:&lt;br /&gt;&lt;dl&gt;&lt;dt&gt;PAL&lt;dd&gt;A PAL DVD, half hour show.&lt;br /&gt;&lt;dt&gt;1080p&lt;dd&gt;A half hour show recorded directly with a DVB-T receiver.  1440x1080p, ~30fps, 10MB/s.&lt;/dl&gt;&lt;br /&gt;I then used JJMediaReader to scan the files and decode the video frames to their native format.  I then took this frame and converted it to an RGB format using one of the tests below.&lt;br /&gt;&lt;dl&gt;&lt;dt&gt;ByteBuffer&lt;dd&gt;Code uses libswscale to write to an avcodec allocated frame in BGR24 format.  The frame is not accessed from Java: this is the baseline performance of using a ByteBuffer, and it could be the end point if then passing the data to JOGL or JOCL.&lt;br /&gt;&lt;dt&gt;ByteBuffer to Array&lt;dd&gt;Perform the above, then use nio to copy the content to a Java byte array.&lt;br /&gt;&lt;dt&gt;IntBuffer&lt;dd&gt;Code uses libswscale to write to an avallocated frame in ABGR format.  Similar to the first test, but a baseline for ABGR conversion.&lt;br /&gt;&lt;dt&gt;IntBuffer to Array&lt;dd&gt;Perform the above, then use nio to copy the content to a Java int array.&lt;br /&gt;&lt;dt&gt;int array&lt;dd&gt;Use JNI function GetPrimitiveArrayCritical, form a dummy image that points to it, and write to it directly using libswscale to ABGR format.  This gives the Java end an integer array to work with directly.&lt;/dl&gt;&lt;br /&gt;In all cases the GC load was zero for reading all frames (i.e. no per-frame objects were allocated).  I'm using JDK 1.7.  The machine is an intel i7x980.  I'm using a fairly old build of ffmpeg (version 52 of libavcodec/libavformat).&lt;br /&gt;&lt;br /&gt;The timing results (in seconds):&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Test \ Video          PAL    1440x1080p&lt;br /&gt;&lt;br /&gt;ByteBuffer            81.5   237&lt;br /&gt;ByteBuffer to byte[]  86.0   279&lt;br /&gt;&lt;br /&gt;IntBuffer             81.3   242&lt;br /&gt;IntBuffer to int[]    86     297&lt;br /&gt;int[]                 81.9   242&lt;/pre&gt;&lt;br /&gt;&lt;h2&gt;Discussion&lt;/h2&gt;So ... using GetPrimitiveArrayCritical is the same speed as using a Direct ByteBuffer - but the data is faster to then access from Java as it can just be indexed.&lt;br /&gt;&lt;br /&gt;Using RGB and ByteBuffer's is a bit quicker than using RGBA.  Apart from the differences down to libswscale there seems some overhead using an IntBuffer (derived from a ByteBuffer) to write to an Int array.&lt;br /&gt;&lt;br /&gt;Using RGB is marginally quicker than using RGBA - although that's mostly down to libswscale, and for my build nothing is accelerated.  When I move to ffmpeg 0.10 I will re-check the default formats i'm using are the quick(?) ones.&lt;br /&gt;&lt;br /&gt;When using a direct buffer and then copying the whole array to a corresponding java array, the overhead is fairly small until the video size increases to HD resolutions.  At 23% for 1440x1080xABGR, it is approaching a significant amount: but this application does nothing with the data.  Any processing performed will reduce this quickly.  At PAL resolution it's only about 5%.&lt;br /&gt;&lt;h2&gt;Conclusions&lt;/h2&gt;For modern desktop hardware, it probably doesn't really matter: the machine is fast enough that a redundant copy isn't much overhead, even at HD resolution.&lt;br /&gt;&lt;br /&gt;Possibly of more interest is how the rest of the pipeline copes.  Obviously with JOGL or JOCL the work is already done when using ByteBuffers, or ideally you'd process the YUV data yourself.  I'm not sure about Java2D though, from a previous post there's a suggestion integer BufferedImage is the fastest.&lt;br /&gt;&lt;br /&gt;However there are possibly cases where it would be beneficial and for Java image processing it is probably easier to use anyway: so I will add this new interface to jjmpeg after confirming it actually works.&lt;br /&gt;&lt;br /&gt;I also found a bug in AVPlane where I wasn't setting the JNI-allocated ByteBuffer to native byte order.  This made a big difference to the IntBuffer to int[] version (well 44% over no array copy in PAL), but wouldn't have been hit with my existing code.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-3930222879838805689?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/3930222879838805689/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=3930222879838805689' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/3930222879838805689'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/3930222879838805689'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/02/tuning.html' title='Tuning ...'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-813566399054560622</id><published>2012-02-10T01:14:00.003+10:30</published><updated>2012-02-10T07:26:38.913+10:30</updated><title type='text'>Sleep n Whinge</title><content type='html'>ugh, what a crappy day.  I hit the grog a bit hard last night (sister dropped by for a couple of hours on her way to the airport), and subsequently had very little sleep; and the neighbours decided today was a good day to re-start the work on the extensions next door.  Had a nap about 5, at least until some dodgey scam out of India rang up about 7:30.  Blah.&lt;br /&gt;&lt;br /&gt;But I played a bit with some code during the day.  I poked around with my slideshow creator, working on some more transition wipes - worked out a 'clock' transition which seemed to take much longer than it should have (for lack of inspiration I'm looking at the SMIL stuff for ideas).  I was going to write a very simple front-end gui for it, but just didn't have the motivation for that today.&lt;br /&gt;&lt;br /&gt;Then I got totally side-tracked with some other stuff: I noticed javafx builds are finally available for gnu/linux, looking at the swingx demo (there's a couple of things that look interesting), the &lt;a href="http://www.jhlabs.com/ip/filters/"&gt;image filters&lt;/a&gt; it uses.  Mr Huxtable also has an interesting article about BufferedImage stuff (which i'm sure i've read before but must have forgotten about): and that got me thinking about changing the way jjmpeg's helpers work with images as it uses 3BYTE_BGR types and direct DataBuffer access..  And that got me thinking about JNIEnv.GetPrimitiveArrayCritical (to avoid 2 copies), and well by this time I was too hung-over and tired to do anything useful.&lt;br /&gt;&lt;br /&gt;I also noticed the neighbours were building a really big verandah which will block most of the direct light into my bathroom, and they over-cut a bit of a tree that hangs over the boundary.  And I got a letter from my insurance company whining about an over-charge they shouldn't have been making in the first place.  All all that together with the severe lack of sleep, put me in a terrible mood and made me feel really rather miserable.  And now it's 3am and they'll be at it at 7am again next to my bedroom window so tomorrow probably wont be much better ...&lt;br /&gt;&lt;br /&gt;Update: Oh fun, 7:25am, shit radio station was bad enough, now it's with the jack-hammer.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-813566399054560622?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/813566399054560622/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=813566399054560622' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/813566399054560622'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/813566399054560622'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/02/sleep-n-whinge.html' title='Sleep n Whinge'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-7247765649260135539</id><published>2012-02-08T14:22:00.002+10:30</published><updated>2012-02-08T16:45:37.986+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><category scheme='http://www.blogger.com/atom/ns#' term='videoz'/><title type='text'>VideoZ</title><content type='html'>I had a go at writing a simple 'media mixer' today.  So far it's only video, but i'm already thinking about how to do the sound (hence some work on JOAL yesterday, I'm planning on using OpenAL-Soft to do the mixing, which gives me '3d sound' for free as well).  Sound is a bit more difficult than video ...&lt;br /&gt;&lt;br /&gt;As output it generates an encoded video file; using jjmpeg of course.&lt;br /&gt;&lt;br /&gt;With a small amount of code i've got a slideshow generator, together with affine transforms, opacity, and video or still pictures.  I'm just using Java2D for all the rendering: so the compositor is fairly slow, but it's workable.&lt;br /&gt;&lt;br /&gt;But, the biggest part of any real application such as this is the user interface for setting up the animation parameters ...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-7247765649260135539?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/7247765649260135539/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=7247765649260135539' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7247765649260135539'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7247765649260135539'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/02/videoz.html' title='VideoZ'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-858573493921686043</id><published>2012-02-07T11:44:00.003+10:30</published><updated>2012-02-07T13:58:47.497+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='readerz'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Paged layout, busy dot.</title><content type='html'>After poking around at jjmpeg a bit this morning, I played a bit more with ReaderZ.  First I added an animated 'busy' icon for when the reader is busy, and moved the epub html loader to another thread so it animates.  It's ugly, but it works.  I simplified the use of the event manager as well.&lt;br /&gt;&lt;br /&gt;Then I redesigned the BlockLayout code in CSZ so that I could sub-class it to create a paged media layout.  It isn't 'conformant' by any stretch, and has a bug with tall images, but at least it forces lines to align to a new page once they've overflowed the viewport.&lt;br /&gt;&lt;br /&gt;During this I realised I probably wont be able to get away with a single-pass for the layout.  e.g. if you have an auto-sized box, it's size depends on things like the size of floats and the lineboxes inside of it.  But you have to lay these all out before you can determine what it is, and then must lay them out again afterwards once you've determined the real size you're working with (also required for things like text-align).  I might have to lay out individual words instead so then the second layout can be fast as well as letting the layout be handled separately from the text object.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-858573493921686043?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/858573493921686043/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=858573493921686043' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/858573493921686043'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/858573493921686043'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/02/paged-layout-busy-dot.html' title='Paged layout, busy dot.'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-1466892300469800055</id><published>2012-02-06T13:08:00.004+10:30</published><updated>2012-02-07T10:27:35.901+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>jjmpeg transcoding</title><content type='html'>Well I had a go at transcoding using jjmpeg.  I added the binding required to get it to work and added a new JJMediaWriter class to handle some of the details.&lt;br /&gt;&lt;br /&gt;It doesn't work very well - many formats just crash.  But at least avi with a few formats works.  I presume i have some problems with the buffer sizes or some-such.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update&lt;/b&gt;: A misunderstanding of the JNI api means I was getting a ByteBuffer pointing to 0, rather than a null ByteBuffer.  I've fixed that up and now the transcode demo works a bit better.  I'm still not flushing the decoders on close, so it isn't complete yet.&lt;br /&gt;&lt;br /&gt;Sources:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://code.google.com/p/jjmpeg/source/browse/trunk/jjmpegdemos/src/au/notzed/jjmpeg/streamwriter/Transcode.java"&gt;Transcode.java&lt;/a&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://code.google.com/p/jjmpeg/source/browse/trunk/jjmpeg/src/au/notzed/jjmpeg/io/JJMediaWriter.java"&gt;JJMediaWriter.java&lt;/a&gt;&lt;br /&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-1466892300469800055?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/1466892300469800055/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=1466892300469800055' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1466892300469800055'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1466892300469800055'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/02/jjmpeg-transcoding.html' title='jjmpeg transcoding'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-6010673991275048916</id><published>2012-02-04T15:57:00.004+10:30</published><updated>2012-02-04T16:28:52.677+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='readerz'/><category scheme='http://www.blogger.com/atom/ns#' term='kobo'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>e-reader, epub</title><content type='html'>Had a mini hack-fest today, and whipped up an e-pub backend for ReaderZ based on CSZ.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-gjS0uVC-qPw/TyzC8Hd5g4I/AAAAAAAAAp8/M16cWxgnjKU/s1600/kobo-8.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://4.bp.blogspot.com/-gjS0uVC-qPw/TyzC8Hd5g4I/AAAAAAAAAp8/M16cWxgnjKU/s400/kobo-8.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5705149166023967618" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I added very basic img tag support as well, as is obvious.&lt;br /&gt;&lt;br /&gt;Apart from the code to parse the content.opf file from the .epub archive which was fairly small, I spent the most time trying to work out a URL handler for a made-up 'epub:' protocol.  I copied the way the jar: protocol handler distinguises between the base archive and the filename using "!/" - this is so that the normal url resolution mechanism work.  But I also wanted to resolve by the manifest ID and I use the url fragment for that (although in hindsight I probably don't need it).  But anyway in the end it wasn't much code, and having it there made everything 'just work', which was nice.&lt;br /&gt;&lt;br /&gt;I also had to deal with all the crap XML brings along: i.e. dtd resolution.&lt;br /&gt;&lt;br /&gt;The actual viewer is a bit unwieldy as it works as a set of html pages.  So you need to pan around to read each 'page' (i.e. chapter, or whole book), and changing pages flips between the items in the spine (i.e. chapters or whole book).  To do better than that I really need a paginating layout engine: which is something for later.&lt;br /&gt;&lt;br /&gt;I have no svg support not surprisingly, so title pages which are pure svg come up a re-assuring blank.&lt;br /&gt;&lt;br /&gt;Still a bit slow opening new chapters, but what can you do eh?&lt;br /&gt;&lt;br /&gt;It's all been checked in to ReaderZ and CSZ.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-6010673991275048916?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/6010673991275048916/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=6010673991275048916' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6010673991275048916'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6010673991275048916'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/02/e-reader-epub.html' title='e-reader, epub'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-gjS0uVC-qPw/TyzC8Hd5g4I/AAAAAAAAAp8/M16cWxgnjKU/s72-c/kobo-8.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-5940233970564097078</id><published>2012-02-02T12:32:00.006+10:30</published><updated>2012-02-02T12:47:58.826+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>floats n stuff</title><content type='html'>I made some more progress on CSZ.  The latest thing I have sort-of working are floats.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-IwQZXer2v2g/TynvDy0lcZI/AAAAAAAAApw/lsGzKncAIao/s1600/csz-1.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 333px;" src="http://4.bp.blogspot.com/-IwQZXer2v2g/TynvDy0lcZI/AAAAAAAAApw/lsGzKncAIao/s400/csz-1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5704353251502420370" /&gt;&lt;/a&gt;&lt;br /&gt;I think i'm interpreting the bits i've implemented correctly: floats are quite limited so the layout logic isn't terribly complex.  I still have no borders or padding (and I removed the fudge factor I had in before) so it looks a bit cramped.&lt;br /&gt;&lt;br /&gt;It's still sad just how much crap you need to get to even this point ...&lt;br /&gt;&lt;br /&gt;I just got a call from work and they want me back in a couple of weeks, so I might turn down the effort a bit so I can psych myself up for that.  Maybe i'll finally use that kobo as a reader of books too ...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-5940233970564097078?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/5940233970564097078/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=5940233970564097078' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5940233970564097078'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5940233970564097078'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/02/floats-n-stuff.html' title='floats n stuff'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-IwQZXer2v2g/TynvDy0lcZI/AAAAAAAAApw/lsGzKncAIao/s72-c/csz-1.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-4205541813714603170</id><published>2012-01-31T17:58:00.004+10:30</published><updated>2012-01-31T19:31:45.549+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='kobo'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>CSZ</title><content type='html'>Well I kept poking away at the XHTML/CSS stuff, for want of something better to do.  I had a couple of wins along the way.&lt;br /&gt;&lt;br /&gt;The cascading and inheritance is working somewhat better now, and I added a few more properties.  Given that I'm not very familiar with all the various rules, I think I have a fairly efficient resolution mechanism by indexing various bits and pieces.  The layout system is still crap, and very very incomplete, but at least I have baselines aligned now.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-zZBuek6RyXg/TyerRpNLMWI/AAAAAAAAApk/WBg7TPEhCMg/s1600/csz-0.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 309px;" src="http://3.bp.blogspot.com/-zZBuek6RyXg/TyerRpNLMWI/AAAAAAAAApk/WBg7TPEhCMg/s400/csz-0.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5703715772695261538" /&gt;&lt;/a&gt;&lt;em&gt;This is a totally contrived example, everything apart from the text layout and typefaces are hacked in one way or another.&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;I tried it on the kobo ... it's still fairly slow, but it's better (I think: TBH i can't remember what I tried the other stuff on).  Still hampered by the text layout though.&lt;br /&gt;&lt;br /&gt;For a 200k file (which is mostly just &amp;lt;p&amp;gt; elements), scanning the file, resolving the properties and generating the box list takes only a fraction of a second.  I'm not trying to resolve or use very many properties though.  About all i'm using `in anger' are some of the font-* properties.&lt;br /&gt;&lt;br /&gt;But then performing the layout (as a single page) takes about 7s (once the jvm is warmed up), which is mostly due to TextLayout.  I will have to try it with a simpler font than the one the JVM comes with.&lt;br /&gt;&lt;br /&gt;Rendering is fairly ok (relative to the e-ink anyway) and all i'm doing is painting every textlayout in the whole tree ...&lt;br /&gt;&lt;br /&gt;I uploaded it to MediaZ anyway, in the new CSZ module.&lt;br /&gt;&lt;br /&gt;Hmmm, I should really take a break from hacking for a bit.  But i'll believe that when I see it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-4205541813714603170?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/4205541813714603170/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=4205541813714603170' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4205541813714603170'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4205541813714603170'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/01/csz.html' title='CSZ'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-zZBuek6RyXg/TyerRpNLMWI/AAAAAAAAApk/WBg7TPEhCMg/s72-c/csz-0.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-1232163779881413343</id><published>2012-01-30T08:57:00.003+10:30</published><updated>2012-01-30T13:32:04.757+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>xhtml, css, boxes n shit</title><content type='html'>For some stupid reason I delved into CSS and XHTML and rendering thereof.&lt;br /&gt;&lt;br /&gt;CSS is so deceptively simple: a few boxes, layouts in lines, and a few properties to set.  Saying the devil is in the details here isn't doing the term justice.  It's all in the details.  And they're ugly.&lt;br /&gt;&lt;br /&gt;From the fairly complex cascading rules, to the number of properties.  The layout merging.  The badly written documentation: filled with "x inline y box" "a block b box c" to such specificity, and no with definitions it is quite difficult to decipher what it's even talking about.  It's also quite hard to debug, since it needs a fair bit of data structure to represent it.&lt;br /&gt;&lt;h3&gt;CSZ&lt;/h3&gt;Anyway, after some mucking about, I have a relatively complete CSS lexer and parser, a fairly incomplete cascade resolver, a fairly incomplete layout engine, and a very incomplete style system.  It's just enough to show paragraphs of text with some basic formatting.  For a book reader I don't want the document to control the text too much anyway.&lt;br /&gt;&lt;br /&gt;I'm attempting to do it while streaming the input, and (obviously thus) in a single pass using the pull parser from XMLStreamReader.  Therefore initial parsing is quite quick, but it's still taking a relatively long time to lay out the boxes ...&lt;br /&gt;&lt;h3&gt;TextLayout&lt;/h3&gt;And the problem here is TextLayout.  It's just quite slow.  I tried my own version of layout using FontMetrics.getCharsWidth(), but inside that just creates a TextLayout anyway, so it's even slower (or maybe not, now it's a bit faster?).&lt;br /&gt;&lt;br /&gt;I know why it's so complicated; for laying out complex scripts and handling all the special cases.  Anyway, that is the primary factor of constraint on performance at the moment, although as the implementation is so far from finished, i'm sure it wont be the last one.&lt;br /&gt;&lt;br /&gt;(I played some more, and the font used plays a big part in the speed taken, so there's hope yet).&lt;br /&gt;&lt;br /&gt;I suppose I should try it on the kobo to see how it goes there.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Dead end?&lt;/h3&gt;It's taken a lot of effort to get this far, and i'm not really happy with the result.  So i'm not sure if i'll keep plugging away at it or throw it away (and if i need such a functionality, use cssbox).  There's a lot left to get it to be useful for anything.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-1232163779881413343?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/1232163779881413343/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=1232163779881413343' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1232163779881413343'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1232163779881413343'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/01/xhtml-css-boxes-n-shit.html' title='xhtml, css, boxes n shit'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-8448798298166396541</id><published>2012-01-27T17:24:00.004+10:30</published><updated>2012-01-27T17:54:19.504+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='readerz'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>ReaderZ</title><content type='html'>I finally got around to checking in ReaderZ to &lt;a href="http://code.google.com/p/mediaz/"&gt;mediaz&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I also tweaked a few things before I did:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Added timer and timeout functionality to the GagdetZ event thread.  Using a ScheduledExecutionService made this an absolute doddle, and infact the 'event thread' is now just the service.&lt;br /&gt;&lt;br /&gt;&lt;li&gt;Changed the update mechanisms to try to speed up panning.  It now pans using black and white updates and then does a full 'invert' refresh once you stop - after a 700ms timeout of inactivity.  It is still a bit slow updating, but that might be because i'm rendering the content on the fly.  I can't seem to get the e-ink to go any faster, although I did speed that up a bit over what it was.&lt;br /&gt;&lt;br /&gt;&lt;li&gt;I took out the HTML backend for now.  I'm using a very slightly modified build of CSSBox (to remove the logging dependencies) and just can't be bothered to work that all out.  It's a bit slow anyway.&lt;br /&gt;&lt;br /&gt;&lt;li&gt;For fun I added a picture viewer backend to the DocumentReader.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-y9AEDYc8LkE/TyJLY5eDMgI/AAAAAAAAApY/Vvw-sXvEp1o/s1600/kobo-7.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://3.bp.blogspot.com/-y9AEDYc8LkE/TyJLY5eDMgI/AAAAAAAAApY/Vvw-sXvEp1o/s400/kobo-7.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5702202969320272386" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/ul&gt;The &lt;a href="http://code.google.com/p/mediaz/source/browse/trunk/ReaderZ/README"&gt;README&lt;/a&gt; has all the other gory details.&lt;br /&gt;&lt;br /&gt;I also had a good look at the text layout mechanism in Java ... boy what a nightmare of code that is.  No wonder it's so slow.  I tried to work out how it was doing it when it came down to it, but I couldn't fathom it before losing interest.  The kobo html reader is about 10x faster at pagination; which just makes it all the more puzzling as to why the text reader is so slow.&lt;br /&gt;&lt;br /&gt;Which has me thinking about C again, and using mupdf's stuff to format and render text instead ... well it would work for latin scripts anyway.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-8448798298166396541?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/8448798298166396541/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=8448798298166396541' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/8448798298166396541'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/8448798298166396541'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/01/readerz.html' title='ReaderZ'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-y9AEDYc8LkE/TyJLY5eDMgI/AAAAAAAAApY/Vvw-sXvEp1o/s72-c/kobo-7.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-7882168992192534748</id><published>2012-01-25T19:34:00.004+10:30</published><updated>2012-01-25T20:10:05.794+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='kobo'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>A browser ... ?</title><content type='html'>Ahh, so I totally didn't think I was going to even try to do this ...&lt;br /&gt;&lt;br /&gt;But epub needs HTML, and I found &lt;a href="http://cssbox.sourceforge.net/"&gt;CSSBox&lt;/a&gt;, and well one thing lead to another ...&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-RddmQCU8paA/Tx_Isy3GD7I/AAAAAAAAApM/SbPwVTMTbZw/s1600/kobo-6.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://3.bp.blogspot.com/-RddmQCU8paA/Tx_Isy3GD7I/AAAAAAAAApM/SbPwVTMTbZw/s400/kobo-6.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5701496325166993330" /&gt;&lt;/a&gt;&lt;br /&gt;So I've basically ported the SimpleBrowser example from CSSBox to ReaderZ - all it can do is render the page, and it lets you pan and zoom as with PDF files.  No links or anything.&lt;br /&gt;&lt;br /&gt;I tried rendering on the fly, and into an image - the former is a little slow to scroll (but not far off the e-ink speed), but I don't think it's clipping the drawing regions properly and so doing a lot more work than necessary anyway.  Using an image scrolls fast but can't be zoomed very well, and takes more memory (I blew it with boycottnovell) ... so trade-offs.&lt;br /&gt;&lt;br /&gt;It's pretty slow and clunky, but what can one expect when XML is involved?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-7882168992192534748?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/7882168992192534748/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=7882168992192534748' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7882168992192534748'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7882168992192534748'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/01/browser.html' title='A browser ... ?'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-RddmQCU8paA/Tx_Isy3GD7I/AAAAAAAAApM/SbPwVTMTbZw/s72-c/kobo-6.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-2597960898629388977</id><published>2012-01-25T14:25:00.002+10:30</published><updated>2012-01-25T15:19:33.510+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='kobo'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Reader Shell</title><content type='html'>So I kept poking away at the browser code and my gadget toolkit.&lt;br /&gt;&lt;br /&gt;It's getting fairly sophisticated now: I improved the StackLayout gadget to add filling and relative sizing glue.  I added a list gadget - which works by pages, and is similar to JList, including a cell renderer, list model and selection model and I played around with a whole lot of other stuff as well.&lt;br /&gt;&lt;br /&gt;So I have enough to finally create a reader shell: it presents a list of the files on the device, and lets you open them up with a pdf reader or a text reader, depending on the file type.  Once inside it has a full-screen viewer with no visible buttons - but without buttons you can zoom, pan, change pages one at a time or flip through many a time.  A popup menu (single short press in the middle of the screen) lets you quit back to the shell.&lt;br /&gt;&lt;br /&gt;I cache the pagination for text files, so they open very quickly after the first visit, assuming the font settings haven't changed.  The pagination descriptor is small, about 7k for a 500 page document.  On a re-visit of the same file (i.e. once the jvm is warmed up), it's loading 500k text file in about 1/2 a second but even from a cold-start it's only about a second.  PDF files also open fast, about the same speed.  i.e. not much slower than the e-ink can refresh a single greyscale page.  Closing a file and returning to the shell is similarly speedy.&lt;br /&gt;&lt;br /&gt;And whilst the interface isn't very complicated, everything is still quite responsive, with no missed finger presses or long unexplained delays (although the first time you open a 500k text file, it still takes about 25s to re-paginate).&lt;br /&gt;&lt;br /&gt;My panning is still a bit slow, although I am rendering the bitmap on the fly as well.&lt;br /&gt;&lt;br /&gt;But ... i've pretty much done what I want for now: prove that the kobo touch e-reader is a zippy little unit, capable of much more performance than the included software lets it deliver.&lt;br /&gt;&lt;br /&gt;I'll drop the code in MediaZ sometime in the next few days, and continue working on it for at least a while.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-2597960898629388977?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/2597960898629388977/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=2597960898629388977' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2597960898629388977'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2597960898629388977'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/01/reader-shell.html' title='Reader Shell'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-2034643702900995280</id><published>2012-01-24T10:19:00.002+10:30</published><updated>2012-01-24T11:31:46.297+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='pdfz'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>PDFZ</title><content type='html'>Checked in the first cut of PDFZ to the MediaZ project.&lt;br /&gt;&lt;br /&gt;It's a simple binding to muPDF - i.e. a PDF renderer for Java that builds on arm and x64/x86 cpus.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-2034643702900995280?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/2034643702900995280/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=2034643702900995280' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2034643702900995280'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2034643702900995280'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/01/pdfz.html' title='PDFZ'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-5291475520060266077</id><published>2012-01-23T15:24:00.005+10:30</published><updated>2012-01-23T15:43:26.489+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='kobo'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Touch Interfaces 2</title><content type='html'>I was wondering how to get a decent user interface with only a finger ...&lt;br /&gt;&lt;br /&gt;I came up with the idea of sensors (no doubt this is what everyone else already does, but I haven't used a touch device before let alone coded for one) that I can attach to a gadget, and the gadget can then decide what to do.&lt;br /&gt;&lt;br /&gt;I have two types of sensors, drag sensors and press sensors.  Drag sensors override any press sensors, and come into action once a drag has started.  It only started after a certain threshold of movement is exceeded.  Press sensors only activate if a press is detected without a drag.  Additionally they can handle long (&gt;300ms) or short presses separately.&lt;br /&gt;&lt;br /&gt;So for a PDF viewer, I came up with the following 'soccer pitch' of sensors ...&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-vmlBrCw67mM/Txzo3fQdpkI/AAAAAAAAApA/_giHIlInUjs/s1600/pdfz-3.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 309px;" src="http://1.bp.blogspot.com/-vmlBrCw67mM/Txzo3fQdpkI/AAAAAAAAApA/_giHIlInUjs/s400/pdfz-3.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5700687268325205570" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The semi-circle sensors are press sensors, and are used to pan around the page in screen-sized chunks in the obvious directions.  I haven't implemented these yet.&lt;br /&gt;&lt;li&gt;The left and right thirds of the entire screen are for page turning.&lt;br /&gt;&lt;li&gt;The thin rectangle at the top of the screen is a drag sensor used for changing the zoom.  There is also a coincident press sensor which is used to reset the zoom.&lt;br /&gt;&lt;li&gt;The thin rectangle at the bottom of the screen is a drag sensor used for flipping through pages.&lt;br /&gt;&lt;li&gt;The whole screen itself is a drag sensor, and used for finger panning.  The whole screen also has a press sensor which will cause an e-ink 'cleanup' on a long press.&lt;/ul&gt;&lt;br /&gt;As far as this goes and even without implementing the panning buttons it makes a fairly comfortable PDF reader out of such a small screen.  I might also need a way to go back to the top of the page but one-screen over, and back to the left of the page but one-screen down (and backwards of those?); although there is only so much space for such hidden buttons to be easy to use.  I could either use the corners, or perhaps just do it based on exceeding the limit of the existing paging buttons.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-5291475520060266077?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/5291475520060266077/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=5291475520060266077' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5291475520060266077'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5291475520060266077'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/01/touch-interfaces-2.html' title='Touch Interfaces 2'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-vmlBrCw67mM/Txzo3fQdpkI/AAAAAAAAApA/_giHIlInUjs/s72-c/pdfz-3.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-4132806331027043589</id><published>2012-01-23T12:15:00.005+10:30</published><updated>2012-01-23T13:06:52.486+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='kobo'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>PDFZ on Kobo</title><content type='html'>Well I got the PDF reader working on the kobo.  The binding library 'just worked' thankfully, so I just cleaned the code up and worked on some performance issues.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-xKNwIbmjAzk/TxzDfSUvy-I/AAAAAAAAAo0/mMU87xgdnsM/s1600/pdfz-1.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://3.bp.blogspot.com/-xKNwIbmjAzk/TxzDfSUvy-I/AAAAAAAAAo0/mMU87xgdnsM/s400/pdfz-1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5700646170606423010" /&gt;&lt;/a&gt;&lt;br /&gt;Using a custom BufferedImage which talks directly to the ByteBuffer turned out to be very slow on the kobo jvm - about 10x slower than just copying the ByteBuffer to an RGB565 BufferedImage in a manual Java loop.  So I just do that ... the images are only the size of the display so memory isn't a big issue.&lt;br /&gt;&lt;br /&gt;It loads and renders a page in about 0.3s, so that is fast enough.  I couldn't work out how to get it to 'fully' refresh automatically very cleanly so I just added a button to do it.  Panning is a bit slow, even with monochrome update mode: OTOH finger events aren't lost when you press-slide-quickly-release.&lt;br /&gt;&lt;br /&gt;For comparison here's the same text with the built-in viewer.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-6aRfruS-Ru0/TxzDewGpoaI/AAAAAAAAAoo/5LUGAMDTb_w/s1600/pdfz-2.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://3.bp.blogspot.com/-6aRfruS-Ru0/TxzDewGpoaI/AAAAAAAAAoo/5LUGAMDTb_w/s400/pdfz-2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5700646161420493218" /&gt;&lt;/a&gt;&lt;br /&gt;I think my last mention if it was a bit unkind - it isn't that bad, it turns out I had been reading a scanned paper, which tends to render poorly in everything.  I found out that you can change pages whilst zoomed, although you either have to scroll off the side of the image first, or bring up the menu - and while the menu is up you get an icon of the 'zoom area' taking up a good chunk of the screen, so they're both a little bit clumsier than they might be.  And some contrast enhancement wouldn't go astray when viewing text (although IMHO mupdf just renders text very well).&lt;br /&gt;&lt;br /&gt;Incidentally, the dark square you can see in the lower left is really there: it's left over from an e-ink refresh after the zoom-preview-box is removed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-4132806331027043589?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/4132806331027043589/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=4132806331027043589' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4132806331027043589'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4132806331027043589'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/01/pdfz-on-kobo.html' title='PDFZ on Kobo'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-xKNwIbmjAzk/TxzDfSUvy-I/AAAAAAAAAo0/mMU87xgdnsM/s72-c/pdfz-1.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-8834508064675415869</id><published>2012-01-22T15:00:00.004+10:30</published><updated>2012-01-22T17:58:08.997+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='kobo'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>PDFZ, mupdf</title><content type='html'>Cut a long story short: i've come up with a simple binding to mupdf for Java.&lt;br /&gt;&lt;br /&gt;I was going to just use jmupdf, but on closer inspection it had some issues I wasn't comfortable with.  By writing my own it let me use the development version of mupdf: which has support for multiple threads and so on, and so removes the need for some of the work that had been done in jmupdf to 'libaryise' it.  All I needed to do to mupdf was add -fPIC to CFLAGS so I could build a shared library out of the jni plugin.&lt;br /&gt;&lt;br /&gt;Rather than bind the whole api (it's huge), i've bound a minimum required to load pdfs for display.  A helper class written in C hides some of the mucking about when loading and rendering pages.&lt;br /&gt;&lt;br /&gt;Obviously one thing that got me started on this was for a pdf viewer on the kobo touch ... although i'm not really convinced the display has enough resolution for it to work that well.  I tried the built-in pdf viewer but it's pretty tedious to use - you can't turn pages unless you're fully zoomed out, and with the file I was testing it had the bounding box wrong so you had to do a lot of zoom in/out just to be able to read anything.  Panning on the e-ink is a bit nasty too.&lt;br /&gt;&lt;br /&gt;I haven't tried running it yet on the kobo: but i've already built the binaries using a cross-compilation environment so hopefully it should 'just work'.  I've also made some progress on a GadgetZ based implementation, but it's super rough, here's a screenshot of it running on my workstation in my 'emulator'.  I've upped the gamma somewhat in preparation for the e-ink and to test that functionality (which is why it is so dark).&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-VHlngldk4K4/Txu3IV4n5UI/AAAAAAAAAoI/3WveDf0iSlY/s1600/pdfz-0.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 309px;" src="http://2.bp.blogspot.com/-VHlngldk4K4/Txu3IV4n5UI/AAAAAAAAAoI/3WveDf0iSlY/s400/pdfz-0.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5700351107308971330" /&gt;&lt;/a&gt;&lt;br /&gt;I'm going to try to see if i can render the display list real-time, and only keep a frame-buffer sized backing for it.  It's fast enough on the desktop, but i'm not sure about the ARM cpu on the kobo when it comes to panning (at the zoom setting shown in the picture it's taking about 0.002-0.004s to render on my workstation ... so it should be fast enough!).&lt;br /&gt;&lt;br /&gt;The user interface with a single 'finger' input will be a challenge too ...&lt;br /&gt;&lt;br /&gt;I crashed my kobo a couple of days ago whilst trying to work out suspend/resume, and spent a few wasted hours trying to activate it again after a factory reset since the windows pc I have access to suddenly decided to have it's hdd die ... so i'm a little reluctant to overuse it at the moment (although I ended up setting up the kobo application in virtualbox, should i need it again).&lt;br /&gt;&lt;br /&gt;I've also had enough for today, so I'll look at the kobo another time.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-8834508064675415869?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/8834508064675415869/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=8834508064675415869' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/8834508064675415869'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/8834508064675415869'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/01/pdfz-mupdf.html' title='PDFZ, mupdf'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-VHlngldk4K4/Txu3IV4n5UI/AAAAAAAAAoI/3WveDf0iSlY/s72-c/pdfz-0.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-5217295065940240305</id><published>2012-01-20T11:40:00.003+10:30</published><updated>2012-01-20T12:23:30.878+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='kobo'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Touch Interfaces</title><content type='html'>Although I should really be working on the shed or just getting some fresh air, I kept poking away at the kobo code.&lt;br /&gt;&lt;br /&gt;I thought i'd try to implement a copy of the 'date setting' thing: the one on the kobo is really slow and hard to use for some reason and I wanted to verify that it wasn't a fault of the device ...&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-3yXTXSe3UpE/TxjCgtzC4gI/AAAAAAAAAnw/IukV76qe2CQ/s1600/kobo-1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://3.bp.blogspot.com/-3yXTXSe3UpE/TxjCgtzC4gI/AAAAAAAAAnw/IukV76qe2CQ/s400/kobo-1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5699519195742659074" /&gt;&lt;/a&gt;&lt;br /&gt;(this is similar-but-not the same as the kobo date setting page)&lt;br /&gt;&lt;br /&gt;Rather than copy the same interface (which is based on selecting what to change and up/down buttons), I came up with a different one based on more 'standard' widgets: dropdown menu's and sliders.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-PNjbsr0ko8E/TxjCgIJGyQI/AAAAAAAAAno/s1mwhXTjIgA/s1600/kobo-2.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://1.bp.blogspot.com/-PNjbsr0ko8E/TxjCgIJGyQI/AAAAAAAAAno/s1mwhXTjIgA/s400/kobo-2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5699519185634642178" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I'm also using a new 'requester' class, which is a modal window which opens centered on the screen.&lt;br /&gt;&lt;br /&gt;Of course, the problem with writing your own toolkit is you need to write all those things too ... but I managed to get those working eventually.  The popup menu will try to align with the selected item, otherwise it fits to the screen.&lt;br /&gt;&lt;br /&gt;I hit my first snag when trying to create requesters: the event loop iterates through a list of gadgets, but that ended up changing during the update since I was adding a new window ... so my first solution was to hack the popups in separately to the Screen class: it needs to track them for events anyway.  I knew that although I wasn't hitting it at the moment, there was also a potential for multi-thread issues as well, so I attempted to create a work-around for that too.  I removed access to the children of a Group gadget, and forced access through a synchronous interface which makes a copy of the list.  Although it can potentially access stale data, it should eventually catch up.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-cAPr6SBGM3E/TxjCfpK7mvI/AAAAAAAAAnc/tjrxIZi3SOQ/s1600/kobo-3.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://4.bp.blogspot.com/-cAPr6SBGM3E/TxjCfpK7mvI/AAAAAAAAAnc/tjrxIZi3SOQ/s400/kobo-3.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5699519177320798962" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I'm using a layout class for the layout of the menu content, so although it's not shown, the 'day' selector uses a grid layout and lists the day numbers in quasi-calendar format.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-bckR9J19_wo/TxjCfby-vVI/AAAAAAAAAnM/3HFVA6kPjRU/s1600/kobo-4.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://2.bp.blogspot.com/-bckR9J19_wo/TxjCfby-vVI/AAAAAAAAAnM/3HFVA6kPjRU/s400/kobo-4.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5699519173730680146" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;For the year I thought i'd try something different - a slider bubble, which just goes away once you lift your finger.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-smgLyOnAqYM/TxjCfP1IjpI/AAAAAAAAAnE/0z0AEoCMJkM/s1600/kobo-5.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://3.bp.blogspot.com/-smgLyOnAqYM/TxjCfP1IjpI/AAAAAAAAAnE/0z0AEoCMJkM/s400/kobo-5.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5699519170518486674" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In action ... showing the faster update mode.  I can also set it to update synchronously - in this case you get no artefacts, but it only updates about 3fps.&lt;br /&gt;&lt;br /&gt;I also poked around with the update manager a fair bit.  I had been just shipping repaint events to another thread and letting it merge them if it happened to have a few in the queue.  But this meant the display would update quickly as soon as something changed and then usually the rest of the updates for handling that input would happen later: i.e. at least 2 updates (which are slow if you're waiting for them).  So I now have it merge the updates as they come in, and only fire them off after processing input events.  There is still some issues with this for the e-ink, e.g. if you do anything on an input 'pressed' event such as highlighting a button, and then perform the action on the 'released' event you still end up having to wait for at least one redundant updates - which again is slow if you're synchornising with the e-ink.&lt;br /&gt;&lt;br /&gt;I also experimented with an 'invert' display every now and then: e.g. when a requester is closed.  This is needed to clean up the e-ink fully.  Here I hit some snags with synchronisation with the device driver although I have it sort of working.  Sometimes it inverts the old data first, and then inverts the final data, even though i'm ensuring the updates are being sent in order.  I am just using the same update maker though, so it probably has something to do with that.&lt;br /&gt;&lt;br /&gt;Finally, I tried to work out how to resume from suspend but got nowhere - I can suspend the box but it wont resume on any button presses and I have to do a hardware reset.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-5217295065940240305?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/5217295065940240305/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=5217295065940240305' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5217295065940240305'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5217295065940240305'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/01/touch-interfaces.html' title='Touch Interfaces'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-3yXTXSe3UpE/TxjCgtzC4gI/AAAAAAAAAnw/IukV76qe2CQ/s72-c/kobo-1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-6923237674594866678</id><published>2012-01-17T19:26:00.006+10:30</published><updated>2012-01-17T21:27:18.146+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='kobo'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>GadgetZ ... on kobo.</title><content type='html'>So the projects of the ...Z's continue.&lt;br /&gt;&lt;br /&gt;I played a bit with the thinlet toolkit - I managed to get some stuff to display, but I had trouble with the refreshing and mapping the input events properly.  It also lacked some features and had a weird-arsed reflection based event system.&lt;br /&gt;&lt;br /&gt;So, although I really didn't want to, I started from scratch with a 'simpler' toolkit.  Which wasted all of today and half of yesterday ...&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-5eElSiX0QQQ/TxU30hXntII/AAAAAAAAAm4/5eJpPMsI48k/s1600/kobo-0.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://1.bp.blogspot.com/-5eElSiX0QQQ/TxU30hXntII/AAAAAAAAAm4/5eJpPMsI48k/s400/kobo-0.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5698522278957397122" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;But after quite a bit of jiggery pokery I have some basic widgets working (as usual, it's the layout stuff that's a real shit to get working at all, let alone correctly).  All events are handled in one thread, and all rendering his handled in another - so although you sometimes get artefacts, it ends up catching up: and the interface remains quite responsive in the meantime (so it should, there are at least 5 threads working away so nothing needs to block).  The touch-screen does seem to have some issues with some areas of the screen but I don't think I can do anything about that.&lt;br /&gt;&lt;br /&gt;I also worked out the input events for the two buttons (home and power) - for some reason when I tried it previously I got nothing.  So when I get to that I can hook those up too.&lt;br /&gt;&lt;br /&gt;The paint manager isn't terribly efficient - but it seems to be 'good enough' at this point even with piles of printf output.  i.e. the cpu on the thing really is rather gutsy particularly compared to the slowness of the e-ink display.&lt;br /&gt;&lt;br /&gt;Next time I play with it i might look at a text reader.  Unfortunately paginating isn't quite as simple as i'd like - I wanted to read it off disk on the fly: but character set and word-wrap stuff makes this really messy.  So to start with it might be based in memory.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update&lt;/b&gt; Whilst I was writing this I had a bit more of a play.  If I turn off anti-aliasing and use black and white for everything, the display updates much faster and with fewer artefacts.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-6923237674594866678?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/6923237674594866678/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=6923237674594866678' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6923237674594866678'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6923237674594866678'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/01/gadgetz-on-kobo.html' title='GadgetZ ... on kobo.'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-5eElSiX0QQQ/TxU30hXntII/AAAAAAAAAm4/5eJpPMsI48k/s72-c/kobo-0.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-150128764242041048</id><published>2012-01-14T13:04:00.005+10:30</published><updated>2012-01-14T13:26:06.930+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='kobo'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Kobo Hacking</title><content type='html'>Hmm, so curiosity got the better of me and I played a bit more with my recently acquired kobo touch.&lt;br /&gt;&lt;br /&gt;I downloaded the correct version of codesourcery gcc, and trivially compiled some binaries for it - being able to simply ftp code via wifi and run it in a telnet session provides for simple and rapid experimentation.  I found also that if i just run the web browser and leave it on the google page the wifi stays active for longer; although I could always kill the ebook reader as well.&lt;br /&gt;&lt;br /&gt;I ported a simple linux framebuffer example to run on it, and using the definitions and ioctls in mxcfb worked out how to tell the e-ink to refresh.  It can be quite slow if you fully wait for the refresh, but that is only required sometimes by the looks of it.  It seems to work a bit like an etch-a-sketch, and bits get left behind sometimes, requiring a full 'shake' once in a while to clean up the display (while reading, this makes the e-ink look even more like paper, since you seem to get the previous page 'showing through' as you do with most paperbacks).&lt;br /&gt;&lt;br /&gt;I started playing with a simple 'hello world' using freetype to render some text ... but hell, that's just too much hassle.  So I'm going to play a bit with Java first; Java on ARM is pretty gutless, so it may not be fast enough, but there's no harm in trying.  The machine has 256MB/ram, so at least memory shouldn't be too much of a problem.  It's a pity Java2D cannot write to a direct ByteBuffer so I will have to have a separate BufferedImage to render into, and some messy update code; but I think such overheads will be immeasurable compared to the e-ink update.&lt;br /&gt;&lt;br /&gt;I tried the Sun JRE (actually it's about the only binary JRE i could find), which seems to work ok (at least with a simple 'hello world'), and now i'm working on a tiny bit of JNI to talk to the framebuffer display.  Rather than have to do everything from scratch, I'm looking at porting a very simple toolkit &lt;a href="http://thinletweb.appspot.com/"&gt;'thinlet'&lt;/a&gt; to work with this output device, and I'll also write some glue to the touchscreen input.&lt;br /&gt;&lt;br /&gt;I've been really flat lately and so may not put much time into it, but the more I use the device the more the shitty software it comes with is pissing me off.  It seems to be single-threaded and often blocked by inconsequential i/o operations - i.e. when it works the web browser is somewhat faster than the ebook reader is, so it's not so much the hardware or even the basic software, but some silly interactions going on in the background.  Turning on aeroplane mode for example seems to eliminate many of the pauses.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-150128764242041048?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/150128764242041048/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=150128764242041048' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/150128764242041048'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/150128764242041048'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/01/kobo-hacking.html' title='Kobo Hacking'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-3817017882174250223</id><published>2012-01-12T11:48:00.004+10:30</published><updated>2012-01-12T12:26:59.732+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='kobo'/><title type='text'>e-reader</title><content type='html'>So ... I ended up cancelling my order with JB HI FI for the e-reader I ordered.  For a retailer apparently struggling with on-line competition they really need to get their shit together: after 7 days, an 'in stock' item had still not shipped, let alone arrived.  For online shops who have got their shit together, I typically have much larger items (e.g. a rotary clothes line!) arriving in only a few days.&lt;br /&gt;&lt;br /&gt;So anyway, I ended up just going to officeworks; another evil corporation, but it's close.  Originally when I looked last week they had it at RRP, then I was there Tuesday for something else and I saw it at the same price as JB Hi-fi, and by the time I decided to buy one anyway (I was nearly going to drop the whole idea ...) they'd dropped another 15$ off that price.  Apart from having to stand around like an idiot for 10 minutes before anyone would serve me (unfortunately, the aren't just stacked on a shelf like everything else) it was an easy buy ...&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Kobo Touch&lt;/h2&gt;&lt;br /&gt;So the device is a Kobo Touch.&lt;br /&gt;&lt;br /&gt;The hardware is quite nice - solid feel.  It's a bit smaller than I'd imagined - it's like a small paperback - but the screen is readable enough.  A bit heavier than I expected too, but it's ok. I've never seen e-ink screens before, and the screen is better in brighter light but it's still ok in more subdued lighting.  Screen updating is pretty slow - but for it's purpose to read text it is ok, it feels a it like very old LCD displays with a low battery. Using the 'sketch' tool or an event reader from a login shell, the touch input isn't perfect but does seem fine enough and quite responsive.&lt;br /&gt;&lt;br /&gt;But the software ... is pretty crap.  Usually it's ok, but sometimes it's super-slow.  Dunno why - the hardware is beefy enough.  Text files (book sized) are really slow, as it seems to re-paginate the whole thing every time you open a book, and for such text files it always forgets where you were up to.  Changing display preferences on such a text file also takes about 30 seconds per change ... The touch input (apart from the sketch tool) is a bit hit and miss some of the time, and it is often un-clear if the tool is just too busy to process your input, or it just didn't pick it up in the first place.  Which leads to double-actions or nothing happening.&lt;br /&gt;&lt;br /&gt;I managed to get telnet working and installed strace (from the opensuse arm7l rpm) to see what the GUI was doing.  During one of the long pauses where top showed the system wasn't very busy it spent an inordinate amount of time trying to read a config file with the following sequence was being executed thousands of times:&lt;br /&gt;&lt;pre&gt;stat64("/mnt/onboard/.kobo/Kobo/Kobo eReader.conf", {st_mode=S_IFREG|0755, st_size=3495, ...}) = 0&lt;br /&gt;stat64("/mnt/onboard/.kobo/Kobo/Kobo eReader.conf", {st_mode=S_IFREG|0755, st_size=3495, ...}) = 0&lt;br /&gt;stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2202, ...}) = 0&lt;br /&gt;stat64("/mnt/onboard/.kobo/Kobo.conf", 0x7eacbfc8) = -1 ENOENT (No such file or directory)&lt;br /&gt;stat64("/mnt/onboard/.kobo/Kobo.conf", 0x88f080) = -1 ENOENT (No such file or directory)&lt;br /&gt;stat64("/mnt/onboard/Kobo/Kobo eReader.conf", 0x7eacbfc8) = -1 ENOENT (No such file or directory)&lt;br /&gt;stat64("/mnt/onboard/Kobo/Kobo eReader.conf", 0x88f228) = -1 ENOENT (No such file or directory)&lt;br /&gt;stat64("/mnt/onboard/Kobo.conf", 0x7eacbfc8) = -1 ENOENT (No such file or directory)&lt;br /&gt;stat64("/mnt/onboard/Kobo.conf", 0x88f228) = -1 ENOENT (No such file or directory)&lt;/pre&gt;&lt;br /&gt;Which looks like pretty sloppy coding/mis-using a tool-kit.&lt;br /&gt;&lt;br /&gt;The reader application turns off the wifi when you restart it, so I gave up for now rather than trying to fight with it to get stuff done.  Might read some books first ...&lt;br /&gt;&lt;br /&gt;When I get an arm compiler going again (my beagleboard stuff is all backed up ... somewhere), I might have a closer look.  Installing new binaries is pretty easy, and there's some info on the forums about the touch screen and frame-buffer.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-3817017882174250223?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/3817017882174250223/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=3817017882174250223' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/3817017882174250223'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/3817017882174250223'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/01/e-reader.html' title='e-reader'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-711649850091452445</id><published>2012-01-06T14:56:00.002+10:30</published><updated>2012-01-06T21:02:16.657+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='biographical'/><title type='text'>Another year.</title><content type='html'>Well i've been pretty lazy since my leave started - well not entirely lazy just not wedded to the computer keyboard as I usually am.  Not really missing it either, yet.&lt;br /&gt;&lt;br /&gt;I did however install an automatic watering system and with looking after the garden that keeps me busy enough.  The main tomato plant is nearly spent (after about 13kg of tomatoes, not bad for a 2 year-old plant), but i've had a pile of purple beans and the fresh sweet-corn is very nice too and all the citrus is booming now the wine barrels aren't drying out every warmish day.&lt;br /&gt;&lt;br /&gt;I went to the country for xmas with my sister &amp; niece, my twin brother and mum; but after a week of that and them it was a relief to come home.  I was so over drinking by then I went to bed early on NYE and have been dry since - it didn't help that I had hardly any sleep for weeks beforehand and during so my mood was all over the place.  I couldn't keep up with her nightly drinking and she's too much of a know-it-all to put up with for too long!&lt;br /&gt;&lt;br /&gt;After returning we had a bit of a heat-wave, so I just sat inside with the AC on, sleeping a lot and watching a lot of shit on tv.  I'm a bit full of that too so I've started trying to do some recreational reading again - it's been a long long time - a few years - since I read anything.  Starting with reading EE Doc Smith's Lensman series: the 40s style romance and kill-everyone action is a bit funny but it's better than repeats of some shitty american sitcom ...  On an impulse buy I even ordered an 'ebook reader', although to my chargrin an 'in sock' item still hasn't shipped 4 days later ... reading on the laptop is ok but I'm interested to see how purpose-built device will compare.&lt;br /&gt;&lt;br /&gt;Given I was off the booze for a few days I dropped the coffee too (until this morning) to see if that was the cause of my sleep apnoea going off the scale lately: unfortunately it had no noticeable effect.  I was still tired and able to nap for hours every afternoon (thankfully the work next-door has ceased for the moment which let me `sleep').  So it's most probably just being too lazy and sedentary, too overweight, and perhaps some hay fever.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-711649850091452445?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/711649850091452445/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=711649850091452445' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/711649850091452445'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/711649850091452445'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2012/01/another-year.html' title='Another year.'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-1248936085620768909</id><published>2011-12-09T14:55:00.006+10:30</published><updated>2011-12-09T15:20:27.676+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='biographical'/><title type='text'>Wheee</title><content type='html'>In the nick of time, i've got a big block of leave coming up starting COB today (actually i've basically knocked off - too many hours this week - but have to hang around for something).&lt;br /&gt;&lt;br /&gt;I've been put off travelling for a few years - all those trips to Boston and around the world from Perth were a bit much, so I probably wont be going anywhere.  The garden needs plenty of up-keep anyway so I will probably just bum around the house a lot.  Who knows, maybe i'll get somewhere with the deck and shed floor ...&lt;br /&gt;&lt;br /&gt;Although I should probably take a break from the keyboard I'll no doubt end up doing some hacking - no real plans so I'll see where my interests take me.  Apart from the projects I already have on the go, eventually I want to resurrect my own effort at an internet publishing platform too.&lt;br /&gt;&lt;br /&gt;Another long outstanding task is to build a woodworking bench from some recovered timber, and perhaps a set of steps from the back porch.  Maybe get a welder?  Really should do that shed floor first ...&lt;br /&gt;&lt;br /&gt;I'm still really dark on Sony for cutting my ps3 in half, so i'm not sure i'll play too many games - I would like to look at uncharted 3 though.&lt;br /&gt;&lt;br /&gt;The main dissapointment of the moment is that I ran out of beer last night and haven't brewed anything for months either.  This sounds like something that needs to be rectified with prejudice.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-1248936085620768909?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/1248936085620768909/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=1248936085620768909' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1248936085620768909'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1248936085620768909'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/12/wheee.html' title='Wheee'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-4826232113037803740</id><published>2011-12-04T22:01:00.005+10:30</published><updated>2011-12-05T08:53:47.800+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><category scheme='http://www.blogger.com/atom/ns#' term='imagez'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>DTCWT denoising</title><content type='html'>So I poked at the wavelet code enough to get it to work, and included it in ImageZ.  It still fails with certain sized images, but oh well ...&lt;br /&gt;&lt;br /&gt;It's based on the dual-tree &lt;a href="http://dsp.rice.edu/software/qwt"&gt;quaternion wavelet&lt;/a&gt;, and does a quaternion-to-complex translation during the thresholding which produces a cleaner de-noised result (i.e. it is still using a dual-tree-complex-wavelet-transform ...).  I haven't particularly tried to optimise it much - but it's already about 3-4x faster than the &lt;a href="http://a-hackers-craic.blogspot.com/2011/10/dct-denoising.html"&gt;DCT Denoising&lt;/a&gt; algorithm I implemented earlier (for which there is little opportunity for further improvement).  It also implements the &lt;a href="http://a-hackers-craic.blogspot.com/2011/10/wavelet-denoise-sharpen.html"&gt;sharpen&lt;/a&gt; algorithm I developed, which I wasn't able to get to work with the DCT code.&lt;br /&gt;&lt;br /&gt;I've started working on the OpenCL version for socles as well, but it's got a way to go yet.&lt;br /&gt;&lt;br /&gt;The de-noising isn't my primary interest in this stuff, it's just a freebie to test some of the algorithms and is interesting nonetheless.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-4826232113037803740?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/4826232113037803740/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=4826232113037803740' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4826232113037803740'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4826232113037803740'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/12/dtcwt-denoising.html' title='DTCWT denoising'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-8003894279509482070</id><published>2011-11-30T19:16:00.004+10:30</published><updated>2011-11-30T19:33:11.615+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>rmi</title><content type='html'>So i'm stuck a bit with a problem at work, we're using some crappy obsolete API that only works in 32-bit mode, but the rest of the application needs to run in 64-bit to fully utilise the hardware.  Bit of a shit really - it's unclear if there is a 64-bit API available - I think there is - but I don't have access to the device and he who does is currently travelling ...&lt;br /&gt;&lt;br /&gt;As a fall-back in-case there isn't another way I hacked up a quick system based on RMI - and I was surprised how easy it was considering I started from scratch.  I think it helped that I already had a relatively simple API that abstracted different devices, so I just wrote a proxy object in a few dozen lines of code.  And of course it helped that I've played with CORBA before and know how it all works.&lt;br /&gt;&lt;br /&gt;The nice thing about RMI is it lets you send classes across the wire (with local processing behaviour as well), which meant I could implement some backend dependent code like getControls() in the GUI front-end process with only a few minor changes to use a more generic property interface.&lt;br /&gt;&lt;br /&gt;No doubt it could be (much) more efficient, but I think it should suffice for this project.  It's something i've been meaning to try for a while and if this experience is any guide it wont be the last either.&lt;br /&gt;&lt;br /&gt;Yeah, i'm happy (as happy as i get anyway) to be hacking again after being stuck in maths for so long, although we had a phone meeting today and I've some more ideas to try so it's not over yet.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-8003894279509482070?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/8003894279509482070/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=8003894279509482070' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/8003894279509482070'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/8003894279509482070'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/11/rmi.html' title='rmi'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-1862782831641264004</id><published>2011-11-30T09:46:00.005+10:30</published><updated>2011-11-30T11:36:19.030+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>jjmpeg video creation</title><content type='html'>Been a bit quiet around here lately - actually it hasn't at all.  Building going on next door means I haven't had a decent sleep in a couple of weeks and it's really wearing me down.  Add to that I was stuck for a few weeks trying to grok some hairy maths for work and I just haven't had the inclination nor energy to pursue much other than eating, drinking, and some TV.&lt;br /&gt;&lt;br /&gt;But back OT, I've started looking at moving my client's application to using jjmpeg - as I need 64-bit microsoft to work, and i'm also having some troubles with gc load with lots of transient objects.&lt;br /&gt;&lt;br /&gt;Getting video frame reading going was trivial but I had to code up a fair bit of extra stuff to be able to create videos in proper containers which is the other requirement I have.  I've checked in a first cut at that - although I need to do more testing particularly wrt GC performance.&lt;br /&gt;&lt;br /&gt;I tried to come up with a helper class with a nice API to use it, and the following demonstrates it's use:&lt;br /&gt;&lt;pre&gt;        AVFormatContext.registerAll();&lt;br /&gt;&lt;br /&gt;        JJMediaWriter writer = new JJMediaWriter(filename);&lt;br /&gt;        JJVideoStream vstream = writer.addVideoStream(width, height, fps, 400000);&lt;br /&gt;        BufferedImage image = vstream.createImage();&lt;br /&gt;&lt;br /&gt;        writer.open();&lt;br /&gt;&lt;br /&gt;        Graphics2D gg = image.createGraphics();&lt;br /&gt;        gg.setBackground(Color.black);&lt;br /&gt;        gg.setColor(Color.white);&lt;br /&gt;        for (int i = 0; i &amp;lt; fps * 5; i++) {&lt;br /&gt;                gg.clearRect(0, 0, width, height);&lt;br /&gt;                gg.drawString("Moving Text!", i, i);&lt;br /&gt;                vstream.addFrame(image);&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;        writer.close();&lt;/pre&gt;&lt;br /&gt;Well I can't imagine it being much simpler than that.  This also (for the most part) avoids transient object creation, so should be (relatively) efficient.&lt;br /&gt;&lt;br /&gt;Unfortunately things aren't so clean under the bonnet, but I guess what you don't see wont hurt you will it?&lt;br /&gt;&lt;br /&gt;The full example is in &lt;a href="http://code.google.com/p/jjmpeg/source/browse/trunk/jjmpegdemos/src/au/notzed/jjmpeg/streamwriter/VideoWriter.java"&gt;VideoWriter.java&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-1862782831641264004?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/1862782831641264004/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=1862782831641264004' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1862782831641264004'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1862782831641264004'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/11/jjmpeg-video-creation.html' title='jjmpeg video creation'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-8668896939457955365</id><published>2011-11-15T11:50:00.005+10:30</published><updated>2011-11-15T12:07:56.385+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='rants'/><title type='text'>Bad typesetting, and evince sucks.</title><content type='html'>I've been reading a lot of papers of late, and any that aren't typeset using TeX are blatantly obvious and hard to read.  Even very old papers are much easier to read than some of the modern ones - I wonder how they are being formatted.  The microsoft word ones are really obvious - and utterly awful - but there are others which are pretty crap too.  The mathematics is the obvious really low-point, but even the typefaces and layouts are pretty rank.&lt;br /&gt;&lt;br /&gt;It is however quite a pleasure when I do come across a properly typeset manuscript in the familiar CMR typeface.&lt;br /&gt;&lt;br /&gt;I am also utterly fed up with evince.  It refuses to remember anything I ask it - from the location I save files to, to the size of the window, and the lack of global close is just annoying.  I had a very wide two-page book I just played with at full two-monitor wide size ONE FUCKING TIME a few days ago, and Now every time I open EVERY document, evince opens at the same ginormous size.  Fucking ticking me off big-time.&lt;br /&gt;&lt;br /&gt;I even tried mupdf - which is just a bit bare-bones for my liking (although it renders much better), and discovered jmupdf along the way who's tiny demo test application is almost as 'featureful' as evince (not that that means much ...), so I might feel another project coming on ... apart from reading them, i'm staring to get a sizeable collection of papers and it's hard to keep track of them based-on-the-title-in-the-filename.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-8668896939457955365?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/8668896939457955365/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=8668896939457955365' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/8668896939457955365'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/8668896939457955365'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/11/bad-typesetting-and-evince-sucks.html' title='Bad typesetting, and evince sucks.'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-4599290230451574179</id><published>2011-10-28T10:34:00.005+10:30</published><updated>2011-10-28T16:21:43.490+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='rants'/><title type='text'>Fucked up Fridays</title><content type='html'>What is it about Fridays lately ...&lt;br /&gt;&lt;br /&gt;Well the latest little thing to ruin my day has been the inability of Firefox 7 to function correctly with the primary selection.  &lt;strike&gt;It seems to want to ignore middlemouse.contantLoadURL for some reason&lt;/strike&gt;.  Given that it's a recently new setting and fully documented I presume it's just a bug, but what a pain.&lt;br /&gt;&lt;br /&gt;It's not something I use constantly but discovering it doesn't work is pretty annoying.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update:&lt;/b&gt; So now it decides it's going to work.  Well what can I say ... except maybe that I need to get AFK more often.&lt;br /&gt;&lt;br /&gt;I'm totally sick of the upgrade treadmill and feel somewhat annoyed by being forced to install a newer version of Fedora just to get my graphics card working.  I had everything working just nicely and was familiar enough with any of the the warts left to not notice them.  And now I have to go through all that crap again.  The thought that firefox will become 'versionless' horrifies me, as does the love-fest that is HTML5+JavaScript where I will no longer be able to ignore CO2 belching crap like I can now by just disabling flash.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-4599290230451574179?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/4599290230451574179/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=4599290230451574179' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4599290230451574179'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4599290230451574179'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/fucked-up-fridays.html' title='Fucked up Fridays'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-2129035122266364970</id><published>2011-10-27T20:20:00.007+10:30</published><updated>2011-10-28T14:10:02.285+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>socles demos</title><content type='html'>I finally got off my fat arse - or is that sat on it further enlargening[sic] it - and tied up some of the test driver code I have for socles into a set of demos.&lt;br /&gt;&lt;br /&gt;I also implemented the colour mode for the DCT denoising algorithm.  Over-all it's a little slow still - i.e. not fast enough for real-time video.  One of these days i'll get around to the complex wavelet version, that should be a lot faster and can also sharpen.  I haven't been able to suss out DCT sharpening and so far my attempts add too many artefacts to be useful (i.e. pixel-level chess pattern).&lt;br /&gt;&lt;br /&gt;The demos so far are:&lt;br /&gt;&lt;dl&gt;&lt;dt&gt;AdaptiveBlur&lt;br /&gt;&lt;dd&gt;An interactive window that shows an experimental algorithm I came up with some time ago for de-noising.  It uses sobel filter to detect edges, then uses that to progressively blend between a blurred and non-burred image.  Works ok sometimes.&lt;br /&gt;&lt;dt&gt;ConvolveNonSeparable&lt;br /&gt;&lt;dd&gt;Simple non-separable convolution that blurs an image.&lt;br /&gt;&lt;dt&gt;ConvolveSeparable&lt;br /&gt;&lt;dd&gt;Separable convolution to do the same thing (&lt;strike&gt;and demonstrates the code is broken atm&lt;/strike&gt; - demo was broken, fixed)&lt;br /&gt;&lt;dt&gt;DCT8x8Mono, DCT8x8Colour&lt;br /&gt;&lt;dd&gt;Interactive DCT based denoise demo for mono/colour images.&lt;br /&gt;&lt;dt&gt;WebcamFX&lt;br /&gt;&lt;dd&gt;Another old interactive demo I wrote which uses Video4Linux to access a webcam and apply a bunch of effects including KLT motion detection and viola-jones face detect.  It also shows the first half of a low-overhead video display path: the GPU does the colour conversion from raw frames.  Well as low as possible with v4l4j anyway.&lt;br /&gt;&lt;/dl&gt;They're in the &lt;a href="http://code.google.com/p/socles/source/browse/trunk/#trunk%2Fsoclesdemo"&gt;soclesdemo sub-module&lt;/a&gt; in socles' cvs.&lt;br /&gt;&lt;br /&gt;Hmm, another week nearly down.  I've been reading lots of papers and trying to suss out some fiddly crap for work, so this stuff has been a nice distraction.  That's finally going somewhere so might keep me busy for a bit.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-2129035122266364970?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/2129035122266364970/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=2129035122266364970' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2129035122266364970'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2129035122266364970'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/socles-demos.html' title='socles demos'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-6149648117777244051</id><published>2011-10-26T12:17:00.004+10:30</published><updated>2011-10-26T23:12:52.803+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>GC, finalisers</title><content type='html'>So I was doing some memory profiling the other day (using netbeans excellent excellent profiler - boy I could've used this 10 years ago) to try to track down some resource leakages and I noticed that xuggle was really exercising the system heavily.&lt;br /&gt;&lt;br /&gt;So it seems I might look at moving to use jjmpeg in my client's application fairly soon.  There are some other reasons as well: i.e. not being able to run in a 64-bit JVM on microsoft windows is starting to become a problem, and the bundled ffmpeg is just a bit out of date.&lt;br /&gt;&lt;br /&gt;Since I haven't implemented memory handling completely in jjmpeg I went about looking how to do it 'properly'.  I was just going to try to use finalisers, but then I came across this article on &lt;a href="http://java.sun.com/developer/technicalArticles/javase/finalization/"&gt;java finalisers&lt;/a&gt; which said it probably wasn't a good idea.&lt;br /&gt;&lt;br /&gt;I was going to have a short look this morning but suddenly it was 4 hours later and although I had something which works i'm not sure yet that I like it.  It seems the cleanest way to implement the suggestions of using weak references, and mixing the auto-generated and hand-crafted code I want, so I will probably end up running with it.  The public api didn't need to change.&lt;br /&gt;&lt;br /&gt;Previously, the binding worked with an object class hierarchy something like this&lt;br /&gt;&lt;pre&gt; AVNative [&lt;br /&gt;   ByteBuffer p (points to allocated/mapped native memory)&lt;br /&gt; ]&lt;br /&gt;   +- AVFormatContextAbstract [&lt;br /&gt;    Generated field accessors and native methods&lt;br /&gt;    Most methods are object methods&lt;br /&gt;   ]&lt;br /&gt;    +- AVFormatContext [&lt;br /&gt;      Public factory methods/constructors&lt;br /&gt;      Hand-coded specific methods&lt;br /&gt;      Hand-coded helper native methods&lt;br /&gt;      Hand-coded finalise/dispose methods&lt;br /&gt;    ]&lt;/pre&gt;&lt;br /&gt;The new structure:&lt;br /&gt;&lt;pre&gt;WeakReference&amp;lt;AVObject&amp;gt;&lt;br /&gt;+- AVNative [&lt;br /&gt;   ByteBuffer p pointing to native memory&lt;br /&gt;   internal dispose() method&lt;br /&gt;   weak reference queue/cleanup as from article above&lt;br /&gt;   Weak reference is AVObject&lt;br /&gt; ]&lt;br /&gt; +- AVFormatContextNativeAbstract [&lt;br /&gt;   Generated field accessors and native methods&lt;br /&gt;   All methods and field accessors are static&lt;br /&gt;   ]&lt;br /&gt;   +- AVFormatContextNative [&lt;br /&gt;     Hand-coded helper native methods&lt;br /&gt;     Implements native resource dispose&lt;br /&gt;   ]&lt;/pre&gt;&lt;br /&gt;Together with&lt;br /&gt;&lt;pre&gt;AVObject [&lt;br /&gt;  AVNative n (the pointer to the native wrapper object)&lt;br /&gt;  public dispose method&lt;br /&gt;  ]&lt;br /&gt;  +- AVFormatContextAbstract [&lt;br /&gt;      Generated public access methods which use AVFormatContextNative(Abstract) methods.&lt;br /&gt;    ]&lt;br /&gt;    +- AVFormaContext [&lt;br /&gt;      Public factory methods/constructors&lt;br /&gt;      Hand-coded specific methods&lt;br /&gt;      ]&lt;/pre&gt;&lt;br /&gt;So yeah - a bit more complicated, and it requires 2 objects for each instance (and often 3 including the C side instance it's wrapping), as well as the overhead of the weakreference instance data and the list entry for tracking the references.  The extra layer of indirection also adds another method invocation/stack frame to every method call.&lt;br /&gt;&lt;br /&gt;On the other hand, it lets the client code use dispose() when it wants to, or if it forgets then dispose will automatically be called eventually.  And makes it obvious in the code where dispose needs to sit.&lt;br /&gt;&lt;br /&gt;As usual it's a question of trade-offs.  If the article is correct then presumably these trade-offs are worth it.&lt;br /&gt;&lt;br /&gt;In this case the whole point of using jjmpeg is to avoid numerous allocations every frame anyway: I can allocate working and output buffers once and just use them directly.  In this case the actual number of objects is quite small and doesn't happen very often, so I suspect that either mechanism would work about as well as the other.&lt;br /&gt;&lt;br /&gt;Well this distraction has blown my morning away; I'd better leave it for now so I can clock up some work hours after lunch.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update&lt;/b&gt; I figured i'd gone too far down this route to do anything other than keep it.  I've checked this in now as well as a bunch of other stuff described on the project page.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-6149648117777244051?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/6149648117777244051/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=6149648117777244051' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6149648117777244051'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6149648117777244051'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/gc-finalisers.html' title='GC, finalisers'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-2092545888038919689</id><published>2011-10-24T22:11:00.005+10:30</published><updated>2011-10-27T20:46:06.704+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>OpenCL DCT Denoise</title><content type='html'>I've just checked in an OpenCL implementation of the DCT de-noising algorithm &lt;a href="http://a-hackers-craic.blogspot.com/2011/10/dct-denoising.html"&gt;I mentioned previously&lt;/a&gt;.  I've only done the mono version so far.&lt;br /&gt;&lt;br /&gt;It's not terribly fast - 10ms wall-clock for a 512x512 mono image, and given that it requires 64 DCT's per 8x8 block and needs to accumulate the results, it probably never will be.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://code.google.com/p/socles/source/browse/trunk/socles/src/au/notzed/socle/dct/dct_8x8.cl"&gt;The kernel source&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update:&lt;/b&gt; Colour version implemented now.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-2092545888038919689?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/2092545888038919689/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=2092545888038919689' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2092545888038919689'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2092545888038919689'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/opencl-dct-denoise.html' title='OpenCL DCT Denoise'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-628567944345859534</id><published>2011-10-22T19:19:00.003+10:30</published><updated>2011-10-22T19:53:48.936+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><title type='text'>Its beaten me.  For now.</title><content type='html'>I should've stayed outside in the sun today gardening - but curiosity got the better of me.  I hope the (absolutely stunning) weather continues tomorrow, otherwise i've blown it on nothing ...&lt;br /&gt;&lt;br /&gt;I tried working on the AMD performance of the Viola &amp; Jones detector in socles: I tried a whole bunch of stuff, from copying the image tiles pre-scaled (as summed area table) to local memory, to completely re-arranging the data structures so they are workgroup aligned, to even trying the cpu single-thread-per-location version.&lt;br /&gt;&lt;br /&gt;I got some minor improvement, the most being the copying the tile to local store and removing some of the calculations (since it doesn't need to scale the rects): but that only took a simple test case from about 25ms to 20ms.  Barely really noticeable in my webcam test harness.&lt;br /&gt;&lt;br /&gt;I think the problem is with the fact it has to read so much data for each single test.  It requires 3-4 uint4's just to describe the test, and 8-12 uint texture lookups for the summed area table lookups.  The cascade I have has ~6&amp;nbsp;400 regions to test grouped in ~3&amp;nbsp000 features, and although most aren't tested it's just a lot of data.  It's too much for constant memory for example.&lt;br /&gt;&lt;br /&gt;With a fix to use the atomic counters AMD hardware provides at least it's now in the same order of magnitude as the nvidia hardware, but still 2-4x slower.&lt;br /&gt;&lt;br /&gt;Maybe ... if the stages were broken up into smaller parts it could work more efficiently, but it does seem a pretty long shot to me as the problem remains with the sheer amount of stuff that needs to be loaded for each test.&lt;br /&gt;&lt;br /&gt;Time probably better spent on something else.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-628567944345859534?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/628567944345859534/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=628567944345859534' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/628567944345859534'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/628567944345859534'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/its-beaten-me-for-now.html' title='Its beaten me.  For now.'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-7654707966263757843</id><published>2011-10-20T15:47:00.008+10:30</published><updated>2011-10-21T15:22:54.940+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='rants'/><title type='text'>Ho hum.</title><content type='html'>Have a new AMD card - HD 6950 - for my workstation, need the catalyst driver for the OpenCL stuff.  I use XFCE so the gnome3 incompatibilities are of no interest to me.&lt;br /&gt;&lt;br /&gt;Couldn't get the driver built for FC13 (all sorts of bugs/problems with the rpm and I really just couldn't be fagged with it all late at night), so `upgraded' to FC15 ...&lt;br /&gt;&lt;br /&gt;It kind of works, but is really slow in really weird ways - when changing virtual desktops one window refreshes at 'cpu speed'.  glxgears @ 6000fps which is really way too slow: I'm getting 10KFPS on my rather older 5770 card on my other older/slower machine.  Although fgl_gxgears is twice as fast on this new card.  Using the AMD CPU backend for OpenCL causes more interference with graphics update than using the GPU backend(!)  The other machine is using catalyst 10.12 on fedora 14, new one is 11.9 on fedora 15 ...&lt;br /&gt;&lt;br /&gt;I've blacklisted the kernel radeon module and whatnot.  I'm using xinerama - i tried without it and it was even slower.&lt;br /&gt;&lt;br /&gt;I think there's just something wrong with the whole system as everything feels rather sluggish - or is that just the price of 'progress'?  I'm trying a yum update (all 1G's worth) and if that doesn't work I might have to try something more drastic.  Obviously the upgrade was a risky choice, but one would hope having the right kernel and X driver would be enough for the video driver ...&lt;br /&gt;&lt;br /&gt;Only 1000 packages to go now ...&lt;br /&gt;&lt;h2&gt;Later ...&lt;/h2&gt;Well it's still really slow.  I tried an older driver release (on windows - hard to find them for fedora) but it wouldn't support the card.  On windows the wall-clock of part of my application runs about 2x vs linux: which is pretty significant since much of the time is just waiting around for the video frame to arrive so the speed-up is presumably more than that.  Needless to say the desktop is smoother too.&lt;br /&gt;&lt;br /&gt;I also tried the viola-jones detector from socles.  Ouch, this really really struggles - about 100x slower than running on nvidia hardware.  I tried a few things that didn't make any noticable difference apart from removing the single rarely-used &lt;code&gt;atomic_inc&lt;/code&gt; which made it jump to about 30x faster - but even with that huge increase it was still well behind the GTX 480.&lt;br /&gt;&lt;br /&gt;I think probably I will have to try some other possible ideas to deal with this:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Scale the images so that each sliding scan reads adjacent locations (i.e. coalesced reads), and go back to 1-thread-per-test/cascade.&lt;br /&gt;&lt;li&gt;Pre-calculate the scaled weights/regions on the cpu so they can be stored in constant memory.&lt;br /&gt;&lt;li&gt;Cache the region/weight information in LS.&lt;br /&gt;&lt;li&gt;Unpack the region/weight info into a flat structure so it is read sequentially rather than walking a tree stored in an array.&lt;br /&gt;&lt;li&gt;? separate the sum calculations from the weight calculations.  By doing less work there might be more locality of reference/chance for any cache to function.  This is just another way to try the first point I guess.&lt;br /&gt;&lt;li&gt;Use atomic counters if available since global atomics are obviously a huge no-no on cayman.&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;I had also better check it on my HD 5770 which runs the fc14 desktop very snappy and runs OpenCL ok to verify it isn't just all down to a shoddy driver (Hmm, now I think about it, I haven't tried OpenCL on it since 'upgrading' to fc14 from a hacked up ancient gnewsense).&lt;br /&gt;&lt;br /&gt;glxgears does start to slow down on the 5770 vs the 6950 as you make the window bigger - so the hardware itself is somewhat faster.  The problems must be in the overhead of the os/drivers.  No question that ATI aren't doing a great job here but on the other hand, the xorg, fdo, and linux guys seem to change their minds about driver/graphics architecture every 6 months too ...&lt;br /&gt;&lt;br /&gt;I was looking forward to playing with some new hardware, but apart from the sluggish GUI and having to `upgrade' the system, most of the application I work on no longer functions as critical routines are returning broken results.  Not fun.  Some of these are going to turn out to be bugs but i've already found problems with the compiler (e.g. commenting out all of the &lt;code&gt;#pragma&amp;nbsp;unroll&lt;/code&gt; directives fixed a bunch of stuff).&lt;br /&gt;&lt;br /&gt;Well as the boss said, these things are so cheap it probably isn't worth my time (or his money!) for me trying to fix these issues ...&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Later Still ...&lt;/h2&gt;Well I seem to have most of the code working again.  Apart from the &lt;code&gt;#pragma unroll&lt;/code&gt; error, they seem to be my own fault.&lt;br /&gt;&lt;br /&gt;First, a bunch of queue synchronisation problems: data being over-written before it was fully processed for example.  NVidias libraries are more aggressive about starting work without an explicit clFlush(). And apart from that I just made some mistakes along the way which weren't exposed until now.&lt;br /&gt;&lt;br /&gt;And one odd one which took a while to track down: passing the same image as both a read_only image, and a write_only one.  I knew this was suss when I did it, but 'it worked' so i left it there: I had it in the back of my mind that this was the sort of thing I should check, but I couldn't remember where I'd done it.&lt;br /&gt;&lt;br /&gt;I still have newly added stability issues - the dreaded and meaningless 'error 134': but in the past these have usually been bugs too.  Although not always.&lt;br /&gt;&lt;br /&gt;So perhaps the drivers aren't so bad after-all; although they are still too slow from linux.&lt;br /&gt;&lt;br /&gt;I guess I should've stuck to one of my rules of thumb of late: if you think you're getting the wrong result from the compiler, you just haven't checked your code closely enough yet.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-7654707966263757843?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/7654707966263757843/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=7654707966263757843' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7654707966263757843'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7654707966263757843'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/ho-hum.html' title='Ho hum.'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-1646759617706969165</id><published>2011-10-18T18:50:00.011+10:30</published><updated>2011-11-02T21:32:19.799+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><category scheme='http://www.blogger.com/atom/ns#' term='imagez'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>DCT denoising</title><content type='html'>Ok now the weekend's over, time to calm down and stop ranting ... ;-)  Bummer about Australia losing though, apart from some real shockers right from the kick-off they did calm down and start playing fairly well.  When they did have a good run - and they had a few - they were let down badly by not enough support at the breakdown.  Still, NZ deserved winners ... And channel 9's race-caller sucked the whole way through.&lt;br /&gt;&lt;br /&gt;I just found this very well put together site about using the discrete cosine transform (DCT) to do threshold de-noising in a manner similar to the wavelet threshold &lt;a href="http://a-hackers-craic.blogspot.com/2011/10/wavelet-denoise.html"&gt;denoising&lt;/a&gt; and &lt;a href="http://a-hackers-craic.blogspot.com/2011/10/wavelet-denoise-sharpen.html"&gt;sharpening&lt;/a&gt; I mentioned before.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.ipol.im/pub/algo/ys_dct_denoising/"&gt;DCT Denoising&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Very slick, complete with well formatted mathematics that puts most microsoft-word based papers to shame, GPL3 source and on-line demo!&lt;br /&gt;&lt;br /&gt;I downloaded the code and modified it not to add the noise and tried it myself on Lenna:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-4ydu_9KAwtI/Tp04x-g14uI/AAAAAAAAAYE/yFJQOHhWp9I/s1600/lenna-dct-5.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="http://4.bp.blogspot.com/-4ydu_9KAwtI/Tp04x-g14uI/AAAAAAAAAYE/yFJQOHhWp9I/s400/lenna-dct-5.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5664746337547444962" /&gt;&lt;/a&gt;&lt;br /&gt;The results are effectively the same as with the complex DTCWT version for moderate settings - visually even the artefacts it introduces are the same.&lt;br /&gt;&lt;br /&gt;In the form provided however it is somewhat more computationally intensive - it's sliding window is offset by single pixels, and the way the C++ is written isn't the most efficient.  I wonder how well it would work with a hanning window and 4 pixel offsets.  I wonder if it can also sharpen - from a quick search it looks like it can.&lt;br /&gt;&lt;br /&gt;Very interesting, and it also works with colour images in smarter ways than just processing each channel separately.&lt;br /&gt;&lt;br /&gt;When I get the time I'll look at coding this up for ImageZ and socles, &lt;strike&gt;although I just noticed blogger mucked up something else - looking at images - so the threshold of having to do something about that is ever approaching&lt;/strike&gt; (I found the option to disable 'lightbox' mode).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update:&lt;/b&gt; Just another advert for Java.  It looked simple enough so I coded up a version in Java using an 8x8 DCT and it runs single-threaded over 3x faster than the C++ version, including the JVM startup or over 4x once it's going.  Rather than generate all 255&amp;nbsp;025(!) patches, transform, threshold, inverse, and merge, it fully processes a single patch each time: requiring that much less DCT memory (i.e. rather a lot - over 62MB less).  So that's 0.9s vs 3.9s for this 512x512 mono image.  Although I can't fathom why my version needs 1/2 the threshold to give a similar result ...&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update:&lt;/b&gt; See &lt;a href="http://a-hackers-craic.blogspot.com/2011/10/opencl-dct-denoise.html"&gt;follow-on post&lt;/a&gt; where i mention implementing it in OpenCL for socles.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update:&lt;/b&gt; I've now added it to ImageZ.  &lt;a href="https://code.google.com/p/mediaz/source/browse/trunk/ImageZ/src/imagez/fx/dct/DCT8Denoise.java"&gt;DCT8Denoise&lt;/a&gt; is the main entry point.  I changed it to work with separate colour planes rather than planes stored in a single array, just to make it easier to invoke from ImageZ.  It's only single-threaded atm.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-1646759617706969165?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/1646759617706969165/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=1646759617706969165' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1646759617706969165'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1646759617706969165'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/dct-denoising.html' title='DCT denoising'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-4ydu_9KAwtI/Tp04x-g14uI/AAAAAAAAAYE/yFJQOHhWp9I/s72-c/lenna-dct-5.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-3502390019512175592</id><published>2011-10-16T19:17:00.003+10:30</published><updated>2011-10-16T19:25:45.663+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='rants'/><title type='text'>Well ...</title><content type='html'>Just when you thought it couldn't get any worse, channel 9 - who hardly showed any of the world cup to start with - have what sounds like a horse-race caller doing the commentary on the AU/NZ semi-final.  He does know the players at least, but doesn't seem to know the rules or that we too can see the same pictures as he is.  So much for a bit of atmosphere, i had to turn the sound right down to be able to focus on the game and not this dickhead.&lt;br /&gt;&lt;br /&gt;You don't realise how much the commentators make the game until you get a complete fuck-wit like this.&lt;br /&gt;&lt;br /&gt;The one bright spot of the channel 9 coverage of the whole world cup - that they didn't provide their own wanker commentators - eclipsed in a moment.&lt;br /&gt;&lt;br /&gt;Australia aren't looking like winners here after the first half, but there isn't much surprise there.  Given a bit of bad luck and some very poor execution they're lucky they're still in it.  NZ have made too many mistakes too.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-3502390019512175592?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/3502390019512175592/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=3502390019512175592' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/3502390019512175592'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/3502390019512175592'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/well.html' title='Well ...'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-5334977704135665126</id><published>2011-10-16T07:43:00.017+10:30</published><updated>2011-10-16T09:58:00.612+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='rants'/><title type='text'>Goodbye google news</title><content type='html'>Well, it's been a weekend for disappointment.  Damn Wales were unlucky ... I'm actually not sure who I want to win out of New Zealand and Australia today - the kiwis just demand so much respect it's hard to barrack against them; i'll have a few drinks and go for whom-ever is playing the best I think.  If they're both on their game it could be a real cracker of a match.  But I digress ...&lt;br /&gt;&lt;br /&gt;So, again google has decided to muck about with something which pretty much didn't need fixing.  Last time they messed with news.google.com.au I wasn't particularly happy but continued to use it fairly regularly as the changes were just cosmetic usability issues but I think these latest changes are going to be too much on-top of a few other reasons i'll detail later.&lt;br /&gt;&lt;br /&gt;TBH I can't believe i'm devoting so much time to such a post - it really doesn't mean that much to me on it's own - but in the over-all scheme of things these small (and not so small) issues do mount up.  It turned into a bit of a mega-rant at the end and the language deteriorates as it goes ...&lt;br /&gt;&lt;br /&gt;First, the existing google news as I see on this laptop ... starting with the top of the page:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-T-RrZMz9IM0/Tpn4gYJ41DI/AAAAAAAAAWw/0qq4QTvVKow/s1600/news-0.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://1.bp.blogspot.com/-T-RrZMz9IM0/Tpn4gYJ41DI/AAAAAAAAAWw/0qq4QTvVKow/s400/news-0.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5663831241518535730" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;And then the middle of the page:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-7uM3VQvN6Yc/Tpn4g1TU5-I/AAAAAAAAAW8/GYfC9-UR39s/s1600/news-1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://3.bp.blogspot.com/-7uM3VQvN6Yc/Tpn4g1TU5-I/AAAAAAAAAW8/GYfC9-UR39s/s400/news-1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5663831249342752738" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;First thing: Yes I (very much) like to use Bitstream Vera Sans as my font for everything: coding, and reading documents.  And even then, only 1 specific size works the best (not being able to do this is the single specific reason I wont even bother to try Chrome).  So all you designers painstakingly choosing your typefaces and font sizes: you're wasting your time, if one can't read the information it is worthless.  Most sites actually work fine with this, although a few have some minor formatting issues (mostly text overrunning the bottom of iframes).&lt;br /&gt;&lt;br /&gt;And secondly I do have a crappy 1024x768 IBM laptop screen.  Although few laptops have resolutions to match anymore, plenty of phones, netbooks and iLandfill slabs don't even get this far.&lt;br /&gt;&lt;br /&gt;Ok, now on to the layout.  There is still a big wasted load of space on the left that they added in the last major layout update, but basically most of the page is used for information content.  Each story has a few alternative links from common (and sometimes not so common) news sources, an email link, and at most a single picture.  Mouse-over's (at least today) are restricted to highlighting the link which is about the most i'd like any browser to do with them.&lt;br /&gt;&lt;br /&gt;Now, to my suprise, I was greeted with the following page when I opened google news on my other laptop this morning:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-ylxr2g3OEjo/Tpn4hiG49XI/AAAAAAAAAXY/04d8NZw4rzg/s1600/news-3.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://2.bp.blogspot.com/-ylxr2g3OEjo/Tpn4hiG49XI/AAAAAAAAAXY/04d8NZw4rzg/s400/news-3.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5663831261370185074" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Hmm, something doesn't look right.  First, everything is in one column.  A huge chunk of wasted space on both the left and the right now.  And what's more, the real killer feature of google news - at a glance being able to see the 'feel' of the media reporting of the news story is conspicuously absent.  There is only a single link to a single news source.&lt;br /&gt;&lt;br /&gt;Actually I couldn't work out how to find anything more than that: I normally browse with Javascript disabled on that machine - because I don't like my lap burning, nor f&amp;#230;cebook to know where my mouse is whilst i'm reading a news article on an unrelated site - and all you end up with is a single link.&lt;br /&gt;&lt;br /&gt;Enabling javascript and reloading, and I discovered a huge pile of annoying mouse-over shit (AMOS).&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-HdWdaolAjwg/Tpn4iG8Q44I/AAAAAAAAAXg/dDTEH-AasAU/s1600/news-4.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://4.bp.blogspot.com/-HdWdaolAjwg/Tpn4iG8Q44I/AAAAAAAAAXg/dDTEH-AasAU/s400/news-4.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5663831271257727874" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;So, now you actually have to click an ugly button to bring this stuff up.  Hoorah, now we have popup-pox infesting web pages too, just loverly[sic].&lt;br /&gt;&lt;br /&gt;And on-top of that it's now somewhat more difficult to decipher - it is trying to add extra information to the other links beyond their titles.  Do I really care that it is an opinion link?  Or why the special notoriety of articles "From the United States [of America]"?  Is their opinion somehow more important?&lt;br /&gt;&lt;br /&gt;And apart from that, there's rubbish like a f&amp;#230;cebook, twatter, and plus-one button in addition to the email link, and 3 video links in addition to the picture.  Clutter.&lt;br /&gt;&lt;br /&gt;So ... I did a search and apparently the cog button is the settings icon these days.  Who'd knew ... (actually I thought it was some logo, not a cog for that matter: it looks more to me like a high-contrast themed variation of the xfce main menu button)  Of course, none of the buttons function if you have Javascript turned off ...&lt;br /&gt;&lt;br /&gt;So to the rather bare settings.  1 or 2 columns, and auto-refresh.  F&amp;#230;ry-nuff, lets try ...&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-CXaR53jqx6A/Tpn8gWD35NI/AAAAAAAAAX0/7IZNBZOzJ34/s1600/news-5.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://4.bp.blogspot.com/-CXaR53jqx6A/Tpn8gWD35NI/AAAAAAAAAX0/7IZNBZOzJ34/s400/news-5.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5663835639002948818" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Oh hang on.  That looks broken.  Why would anyone possibly want to read the site that way?  Not to mention more AMOS to 'enhance the experience', and the same big blank section on the right.&lt;br /&gt;&lt;br /&gt;At least the killer-feature reporting-at-a-glance is back, but there's just no way anyone would labour through such a horrible interface for that.&lt;br /&gt;&lt;br /&gt;Oddly enough ... if you disable javascript ...&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-8PDQ_OiCuIc/Tpn8gPSJCJI/AAAAAAAAAXs/KIdKC2AMp1g/s1600/news-6.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://1.bp.blogspot.com/-8PDQ_OiCuIc/Tpn8gPSJCJI/AAAAAAAAAXs/KIdKC2AMp1g/s400/news-6.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5663835637183744146" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;You get the right-hand side-bar back, and thankfully the AMOS disappears as well.&lt;br /&gt;&lt;br /&gt;Well almost ...&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-vdDeXEEDURM/Tpn4hVCGrLI/AAAAAAAAAXI/BylNMPQ6nIA/s1600/news-2.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://3.bp.blogspot.com/-vdDeXEEDURM/Tpn4hVCGrLI/AAAAAAAAAXI/BylNMPQ6nIA/s400/news-2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5663831257860451506" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;For some reason the top of the page has this non-functional news selection slider thing stuck to it.&lt;br /&gt;&lt;h2&gt;Thoughts&lt;/h2&gt;I can only think that google has a particular idea in mind here: if you're not using a 24" widescreen monitor, then you must be using a phone or some iLandfill toy.  Although that doesn't completely make sense since the new site would be even more useless on phones so they must have 2 separate stylesheets/designs for each one anyway.  So why fuck it up so royally?&lt;br /&gt;&lt;br /&gt;More and more of the web requires javascript - whilst usually using it for pointless crap like implementing buttons in a non-recognisable os-agnostic way (those damn designers again, thinking they can redefine 30 years of progress in human-computer interaction on every page).  I find this whole idea of javascript everywhere very questionable security wise - a web page can load 3rd party application which can then send information (e.g. where your mouse is) to any other 4th party without your knowledge.  And hence more and more web pages are being turned into 'crapplications'.  They're slower, uglier (and certainly not 'theme aware'), and more clumsy than local applications, but they're much heavier cpu and data wise compared to remote ones.  It also closes off the avenues for using alternate browsers: having to have a very high performnace rendering engine and javascript vm is a massive barrier to entry (e.g. even firefox 3.6 is ruled out of many sites now).&lt;br /&gt;&lt;br /&gt;Welcome to the 3rd age of thick-client computing.  All the local computing power required to run local applications, combined with the speed, grace, availability and security of remote ones.  Oh boy!  Hold me back!&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;No news is good news?&lt;/h2&gt;On a personal note i've been trying to avoid reading the news too much anyway and google news itself.  Its always the same old shit.  It's mostly depressing, or at best it's just click-bait to rile you up.&lt;br /&gt;&lt;br /&gt;And google news's aggregation algorithms are pretty much like watching TV based on the ratings: not the sort of experience I'm really after.  For example, apparently `funniest home videos' the most popular show in Australia?  Do I really want the bogans who watch channel 9 deciding what news makes the front page (depressingly the truth is of course that yes, they already do).  With such an ignorant population, no wonder 'no more boats' is a (an almost) winning election slogan around these parts, or that the global warming denialists get so much airtime.  &lt;a href="http://www.youtube.com/watch?v=TXpBfRlLnv8"&gt;A timely reminder&lt;/a&gt; - and exactly what I thought the first time I saw the advert with the sound not muted (which is how I watch advertising if i'm watching 'live' tv, although my tv mute button wore out ...).&lt;br /&gt;&lt;br /&gt;Still, I do like to check at least once every couple of days - least i become one of the ignorant masses if nothing else.  Or to fill a spot to give my brain a rest or whilst waiting for a routine to run ... Unfortunately now I use Java there's no more waiting around for compilation - the 50KLOC bit of code I work on compiles and launches the application from scratch in about 1/2 a second (ant doesn't include resources properly in the jar without a clean rebuild - and building jars is the single terribly weak fucking reason to justify it's utterly shit and astronomically painful fucked up existence - so I have to do it every time when working on opencl code.  Fucking adjective!).&lt;br /&gt;&lt;br /&gt;I guess I can use fairfax for the little Australian news i'm after (democratic politics died for me when Howard went to war, and without that what is the point of listening to those arseholes - and without the politics there's fuck-all left), The Guardian for Europe and summaries or links in a few blogs I visit will do me from now.  I gave up on The ABC months ago - which should really now just be called `The Opposition Says Sydney-Siders Gazette'.  Even SBS TV news has been shit for ages, since they cut their budget it's little more than a patchwork of cheap stories from other services (many barely trying to hide themselves from the happy-story pro-war/pro-usa propaganda they are, like some of the BBC stuff from iraq/afghanistan).&lt;br /&gt;&lt;br /&gt;Barely any of the services do any local news at all.  Most of it is broadcast/published straight out of Sydney or Melbourne.  Not that much of import happens around here, but sometimes you do need to know about local stuff.&lt;br /&gt;&lt;br /&gt;One thing google news showed me (until now) is just how much of the news is just the exact same story repeated ad nauseam, so at least I know I wont be 'missing out' on anything by not using it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-5334977704135665126?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/5334977704135665126/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=5334977704135665126' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5334977704135665126'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5334977704135665126'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/goodbye-google-news.html' title='Goodbye google news'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-T-RrZMz9IM0/Tpn4gYJ41DI/AAAAAAAAAWw/0qq4QTvVKow/s72-c/news-0.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-1117401966642058604</id><published>2011-10-14T18:10:00.008+10:30</published><updated>2011-10-14T18:53:03.515+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='rants'/><title type='text'>Goodbye Mythtv</title><content type='html'>I knew there was a reason I hadn't updated my system in a while, it wanted install rubbish I don't want.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;Dependencies Resolved&lt;br /&gt;&lt;br /&gt;========================================================&lt;br /&gt; Package                                          Arch&lt;br /&gt;========================================================&lt;br /&gt;Removing:&lt;br /&gt; PackageKit                                       i686&lt;br /&gt;Removing for dependencies:&lt;br /&gt; PackageKit-glib                                  i686&lt;br /&gt; PackageKit-gstreamer-plugin                      i686&lt;br /&gt; PackageKit-yum                                   i686&lt;br /&gt; k3b                                              i686&lt;br /&gt; k3b-common                                       noarch&lt;br /&gt; k3b-libs                                         i686&lt;br /&gt; kdebase-runtime                                  i686&lt;br /&gt; kdebase-runtime-flags                            noarch&lt;br /&gt; kdebase-runtime-libs                             i686&lt;br /&gt; kdelibs                                          i686&lt;br /&gt; kdemultimedia-libs                               i686&lt;br /&gt; kdepimlibs                                       i686&lt;br /&gt; mythtv-common                                    i686&lt;br /&gt; mythtv-frontend                                  i686&lt;br /&gt; mythtv-libs                                      i686&lt;br /&gt; phonon                                           i686&lt;br /&gt; phonon-backend-gstreamer                         i686&lt;br /&gt; qt-webkit                                        i686&lt;br /&gt;&lt;br /&gt;Transaction Summary&lt;br /&gt;========================================================&lt;br /&gt;Remove       19 Package(s)&lt;br /&gt;&lt;br /&gt;Installed size: 161 M&lt;br /&gt;Is this ok [y/N]: &lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;All I can say is "What the Deuce?"&lt;br /&gt;&lt;br /&gt;I'm pretty sick of fighting with this type of bullshit.  Why the fuck is anything depending on that PackageKit crap?&lt;br /&gt;&lt;br /&gt;So &lt;b&gt;yes&lt;/b&gt;, it is ok to remove that snot - it's only a console that saves me walking into the next room to set what i'm going to record anyway.  And it's only tv.  &lt;code&gt;wodim&lt;/code&gt; is easier to use than k3b for burning isos for that matter.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;How poetic ...&lt;/b&gt; (just arrived in email):&lt;br /&gt;&lt;ul&gt;&lt;pre&gt;Date: Fri, 14 Oct 2011 00:13:22 -0400&lt;br /&gt;From: "Wordsmith" &amp;lt;wsmith@wordsmith.org&amp;gt;&lt;br /&gt;Subject: A.Word.A.Day--vituperation&lt;br /&gt;&lt;br /&gt;This week's theme: Negative words&lt;br /&gt;&lt;br /&gt;vituperation (vy-too-puh-RAY-shuhn, -tyoo-, vi-) noun&lt;br /&gt;&lt;br /&gt;   Bitter and abusive language; condemnation.&lt;br /&gt;&lt;br /&gt;[From Latin vituperare (to blame), from vitium (fault) + parare (to make or&lt;br /&gt;prepare). Earliest documented use: 1481.]&lt;/pre&gt;&lt;/ul&gt;&lt;br /&gt;&lt;h2&gt;Later ...&lt;/h2&gt;So this episode got me searching for a blacklist option, and I found the &lt;code&gt;exclude&lt;/code&gt; option for yum.&lt;br /&gt;&lt;br /&gt;Yay!&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;    exclude=PackageKit&lt;br /&gt;    exclude=pulseaudio&lt;/pre&gt;&lt;br /&gt;It seems it had something to do with the &lt;code&gt;phonone-backend-gstreamer&lt;/code&gt; and there are alternatives which don't need such rubbish.&lt;br /&gt;&lt;br /&gt;Never did like gstreamer ...&lt;br /&gt;&lt;br /&gt;Chances are the mythtv guys have changed the database format again, so i might hold off on trying to install it anyway: i've had enough excitement for one day.  The secret is N-tier architecture guys ...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-1117401966642058604?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/1117401966642058604/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=1117401966642058604' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1117401966642058604'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1117401966642058604'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/goodbye-mythtv.html' title='Goodbye Mythtv'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-706866598269008729</id><published>2011-10-14T12:57:00.001+10:30</published><updated>2011-10-14T12:59:29.201+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Special-Case Code and Multi-Pass Algorithms</title><content type='html'>Ok, so without going into too much detail I have a function which needs to resample 3 float2 planes of data to another resolution, and then perform very simple arithmetic on it (a few mult, add).  The scale factors are powers of two up and down.  One complication is that the numbers have to be pre-sampled first at pixel corners before being interpolated.&lt;br /&gt;&lt;br /&gt;I implemented it initially using bilinear interpolation for simplicity, and yesterday looked at implementing bicubic filtering.&lt;br /&gt;&lt;br /&gt;It wasn't really that bad - the given routine was about 1.5x the original speed which is ok, and overall this was only a 3% impact.&lt;br /&gt;&lt;br /&gt;But I thought I would try a few ideas to speed it up ...&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A)&lt;/b&gt; I separated the routine into separate implementations, one for each scale.  I still used the same sampling routine, but just passed it a fixed-value for the scale.  In previous micro-benchmarks on the bilinear code I noticed this lead to a pretty decent improvement.&lt;br /&gt;&lt;br /&gt;But in this case it didn't.  It slowed down some scales by a factor of 1-2, and moreover, made other routines in the same source file execute slower(!).  I can only assume the growth in code-size was a significant factor here.  I also noticed the register usage hit 63 again - which probably means all i've done is hit a bug in the compiler again (I should really upgrade the driver: we're moving to AMD hardware RSN anyway).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;B)&lt;/b&gt; Using two passes.  A separate scale pass followed by a calculation pass.  Intuitively this should be somewhat slower: the calculation after the scaling is simple and can be done in registers.&lt;br /&gt;&lt;br /&gt;But of course it turned out faster.  Not a huge amount, about 20% for the routine in question.&lt;br /&gt;&lt;br /&gt;I did have to do some work to make it happen though: using local memory and 2d workgroup sizes, and separate code for the scaling down functions (e.g. it just sums 2x2 block to go down by 2).  In this case using separate functions for each size worked quite well (more evidence of compiler bugs).  I was also able to batch the 3 planes separately to get added parallelism - the problem size is quite small so this should hep.&lt;br /&gt;&lt;br /&gt;... and after writing (C) below I re-arranged the upscaler to use hard-coded sizes as well, and re-did the bicubic interpolator to accept integer and offset values separately: the compiler can remove some of the calculations here since i'm always using the same pixel offsets.&lt;br /&gt;&lt;br /&gt;... and i also experimented with changing the output type to float8 rather than float2 and writing 4 pixels at once for the 4x upscale.  This was 2x faster again for this routine (and uses fewer registers?), although I can't trust this number as the results are now broken (and i really have had about enough of it and don't want to debug it).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;C)&lt;/b&gt; Doing more at once.  e.g. doing 1/2, 1, and 2x at the same time.  Actually because the 2x scale uses hard-coded interpolation numbers the bicubic interpolation can be simplified greatly (that just gave me an idea to improve B) above).&lt;br /&gt;&lt;br /&gt;I didn't get this incorporated because it required a bit of re-arrangement of the host code, but this could shave off a bit more.  I usually need a few scales of the same data in each pass so this would be useful.&lt;br /&gt;&lt;h2&gt;Conclusions&lt;/h2&gt;Although all these could also be applied to the bilinear code, I now (with the changes in B above) have bicubic interpolation for this routine running much the same speed as the original bilinear did.&lt;br /&gt;&lt;br /&gt;But it shows that you sometimes don't want to do too much in a given routine - compiler bugs, register spillage, or just more registers end up being used, which adversely affect parallelism and performance.  Although a trip to memory is quite costly, these other factors can greatly outweigh it.&lt;br /&gt;&lt;br /&gt;After all this, and a few more changes in this particular routine i'm working on, I only managed about a 9% improvement.  TBH i'm not sure it's really worth it ... and I probably only went so far as I had a bit of time between getting this to a working state and heading back to reading papers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-706866598269008729?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/706866598269008729/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=706866598269008729' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/706866598269008729'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/706866598269008729'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/special-case-code-and-multi-pass.html' title='Special-Case Code and Multi-Pass Algorithms'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-265021464569721025</id><published>2011-10-12T18:38:00.010+10:30</published><updated>2011-10-12T21:50:53.602+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='cooking'/><title type='text'>Awesome-ease Chicken</title><content type='html'>Been a while since i shared a recipe, and i've been making some variation of this fairly regularly of late ... This is a sort of kitchen-friendly variation on Portuguese Chicken done in an oven.  And it's super-shit-easy to make.  I used to make it on a BBQ but this is probably nicer to eat and easier to cook properly.&lt;br /&gt;&lt;br /&gt;PS I admit i've had a couple of very lovely glasses of Church Block '07 and came up with the utterly-naff name which i've never used before.  It's just a super-tasty roast chicken.&lt;br /&gt;&lt;h2&gt;1. Cut chicken&lt;/h2&gt;Start by cutting a chicken up the breast-bone.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-NvXqB9L44tY/TpVjqawCUFI/AAAAAAAAAWk/ldqxsJqvMPU/s1600/chicken-0.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://3.bp.blogspot.com/-NvXqB9L44tY/TpVjqawCUFI/AAAAAAAAAWk/ldqxsJqvMPU/s400/chicken-0.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5662541686874722386" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;h2&gt;2. Prepare pan&lt;/h2&gt;Place a handful of (freshly picked of course) thyme in the middle of a suitably sized dish/oven-proof frying pan.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-ND2aDyStGDs/TpVh5ugwnaI/AAAAAAAAAWM/IsqUqrODVio/s1600/chicken-1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://2.bp.blogspot.com/-ND2aDyStGDs/TpVh5ugwnaI/AAAAAAAAAWM/IsqUqrODVio/s400/chicken-1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5662539750854139298" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;h2&gt;3. Mount the fowl&lt;/h2&gt;Push down on the back of the chicken to flatten it out - you should hear bones/joints breaking - if you're picky you can also break out the rib-bones at this point to make it easier to eat - and then place it over the thyme.  I also poked it over with a fork to help the seasoning in and the fat out.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-9dsTixIormk/TpVh5b8JGcI/AAAAAAAAAWA/m_rpTzBREGc/s1600/chicken-2.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://2.bp.blogspot.com/-9dsTixIormk/TpVh5b8JGcI/AAAAAAAAAWA/m_rpTzBREGc/s400/chicken-2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5662539745868716482" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;h2&gt;4. Seasoning, Lemon &amp; Salt&lt;/h2&gt;Cover with the juice of one (small) lemon, and if you have it, about a 2 teaspoons of Asian 'chicken seasoning' - this is about 1/2 salt, with some flour, MSG, onion and stock powder mixed in.  A good teaspoon of vegetta powdered stock, or simply salt and some pepper would suffice.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-ci2XphD7iQk/TpVh5PzfnII/AAAAAAAAAV4/d8j-xNj9bvg/s1600/chicken-3.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://3.bp.blogspot.com/-ci2XphD7iQk/TpVh5PzfnII/AAAAAAAAAV4/d8j-xNj9bvg/s400/chicken-3.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5662539742611217538" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;h2&gt;5. Seasoning, Herbs&lt;/h2&gt;Cover with broken fresh herbs (e.g. sage) and sliced ripe chillies.  I also sometimes add a few thin slices of ripe tomato at this point, but my tomato plants are still growing this early in the season ...&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-oa5qyezNvYg/TpVh4xFPKbI/AAAAAAAAAVo/rsPnOn2_AF8/s1600/chicken-4.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://2.bp.blogspot.com/-oa5qyezNvYg/TpVh4xFPKbI/AAAAAAAAAVo/rsPnOn2_AF8/s400/chicken-4.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5662539734364137906" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;6. Cook It&lt;/h2&gt;Being flat, it cooks a bit faster even at the normal 180C.  I usually baste it a couple of times as well to bring out some colour, and when it looks cooked it usually is.  This small fowl was an hour in a pre-heated oven - about 45-50 minutes/kilo rather than 60.  I also upped the temperature for the last 15 minutes, but one has to be careful not to burn the herbs too much.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-2BXgXwXcP7g/TpVh4mz8jMI/AAAAAAAAAVc/NDrslXeI794/s1600/chicken-5.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://2.bp.blogspot.com/-2BXgXwXcP7g/TpVh4mz8jMI/AAAAAAAAAVc/NDrslXeI794/s400/chicken-5.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5662539731607260354" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;h2&gt;7. Eat It&lt;/h2&gt;Because the chicken is laid down flat it traps the steam inside and cooks from both the inside and outside at the same time (i'm sure the black pan helps).  This cooks it faster and keeps it very moist.  And with the skin upwards it crisps up nicely and builds up a strong flavour.&lt;br /&gt;&lt;br /&gt;It scales in the obvious way to larger fowl - I've cooked up to size 20 chickens this way.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-265021464569721025?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/265021464569721025/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=265021464569721025' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/265021464569721025'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/265021464569721025'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/awesome-ease-chicken.html' title='Awesome-ease Chicken'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-NvXqB9L44tY/TpVjqawCUFI/AAAAAAAAAWk/ldqxsJqvMPU/s72-c/chicken-0.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-3371661841138127754</id><published>2011-10-12T15:59:00.009+10:30</published><updated>2011-12-04T22:10:38.396+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='graphics'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Wavelet Denoise &amp; Sharpen</title><content type='html'>So I had some luck with a bit of fiddling with the scaling function for wavelet sharpening.  And managed to get both sharpening and smoothing working at the same time.  I'm fairly happy with the results.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update:&lt;/b&gt; see also a further post on using the &lt;a href="http://a-hackers-craic.blogspot.com/2011/10/dct-denoising.html"&gt;DCT in a similar way&lt;/a&gt;.&lt;br /&gt;&lt;b&gt;Update:&lt;/b&gt; I've now implemented a version of this in ImageZ, &lt;a href="http://a-hackers-craic.blogspot.com/2011/12/dtcwt-denoising.html"&gt;see the follow-on post&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Ok, first the raw Lenna input image I used - converted to greyscale by Java2D.  Just to make comparison easier and to add another pretty face to the page.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-xfucWH4A7_4/TpUm28woV5I/AAAAAAAAAUg/WjjObjHZccw/s1600/lenna-bw.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="http://3.bp.blogspot.com/-xfucWH4A7_4/TpUm28woV5I/AAAAAAAAAUg/WjjObjHZccw/s400/lenna-bw.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5662474831953155986" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Now, with the sharpening ramped right up.  As you can see it's pretty much the same as using unsharp-mask with a well-selected radius and a medium weight.  And like unsharp mask it tends to emphasise any noise.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-cm7xhcPAKVU/TpUm3JvlwaI/AAAAAAAAAUw/ofaHBCWD4gA/s1600/lenna-real-sharp.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="http://1.bp.blogspot.com/-cm7xhcPAKVU/TpUm3JvlwaI/AAAAAAAAAUw/ofaHBCWD4gA/s400/lenna-real-sharp.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5662474835438453154" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Unsharp mask/Wiener Deconvolution can still work better if the image is simply de-focussed as they have a PSF function to estimate the amount of defocusing.&lt;br /&gt;&lt;br /&gt;Now, with the same settings, and also de-noised very heavily.  Despite the obvious and unnatural looking heavy processing the edge sharpness and most of the detail is still retained rather well.  Most added artefacts are relatively smooth and natural looking too.  If you've ever tried using a median filter or a selective Gaussian blur, you'd know they pretty much suck at retaining any texture detail or clean edges.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-tHjfNpjbUMc/TpUm3wOBl0I/AAAAAAAAAU4/F23LLoNiR38/s1600/lenna-sharp-denoise.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="http://4.bp.blogspot.com/-tHjfNpjbUMc/TpUm3wOBl0I/AAAAAAAAAU4/F23LLoNiR38/s400/lenna-sharp-denoise.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5662474845766653762" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;And finally, a more natural level of sharpening and de-noising.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-lNiLh0zk09Q/TpUm39I1YMI/AAAAAAAAAVE/byeWo9t58YA/s1600/lenna-medium.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 400px;" src="http://4.bp.blogspot.com/-lNiLh0zk09Q/TpUm39I1YMI/AAAAAAAAAVE/byeWo9t58YA/s400/lenna-medium.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5662474849234542786" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Pretty happy with it given how simple the maths is.  I've over-emphasised some of the results by using high values, but a smooth variation in results between the original and any of the extreme values is possible.&lt;br /&gt;&lt;br /&gt;Two steps are applied to each complex coefficient in turn in a way that can be done whilst the coefficients are in registers.  So if you have other processing going on it's essentially free.&lt;br /&gt;&lt;br /&gt;&lt;dl&gt;&lt;dt&gt;Threshold De-noise&lt;br /&gt;&lt;dd&gt;C = C * { abs(C) &gt; T ? ( abs(C) - T ) / abs(C) : 0 }&lt;br /&gt;&lt;br /&gt;Where:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;C&lt;/b&gt; the complex transform coefficient;&lt;br /&gt;&lt;b&gt;abs(x)&lt;/b&gt; returns the magnitude of the complex number x;&lt;br /&gt;&lt;b&gt;T&lt;/b&gt; input threshold from about 0.01 to 0.001.&lt;br /&gt;&lt;br /&gt;(see the previous post for a dead link to the source of this)&lt;br /&gt;&lt;br /&gt;This zeros out small coefficients - which are apparently likely to be noise - and scales the rest to their original range.&lt;br /&gt;&lt;br /&gt;&lt;dt&gt;Scale Bands&lt;br /&gt;&lt;dd&gt;C = C * { ( exp( (bandcount - nband) * scale) - 1 ) * weight + 1 }&lt;br /&gt;&lt;br /&gt;Where:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;bandcount&lt;/b&gt; is depth of wavelet transform;&lt;br /&gt;&lt;b&gt;nband&lt;/b&gt; is number of the band (0 is the highest frequency);&lt;br /&gt;&lt;b&gt;scale&lt;/b&gt; input sharpness 'gradient' from 0-1; and&lt;br /&gt;&lt;b&gt;weight&lt;/b&gt; input sharpness weight from 0-1.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;scale&lt;/b&gt; is a general 'sharpening factor' setting, and &lt;b&gt;weight&lt;/b&gt; specifies how heavily it is applied.&lt;br /&gt;&lt;/dl&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-GD2WL-RWI_I/TpU6hsiSEHI/AAAAAAAAAVQ/B5zRt70sYaQ/s1600/sharpen-band-multipliers.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 264px;" src="http://4.bp.blogspot.com/-GD2WL-RWI_I/TpU6hsiSEHI/AAAAAAAAAVQ/B5zRt70sYaQ/s400/sharpen-band-multipliers.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5662496457053311090" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-3371661841138127754?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/3371661841138127754/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=3371661841138127754' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/3371661841138127754'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/3371661841138127754'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/wavelet-denoise-sharpen.html' title='Wavelet Denoise &amp; Sharpen'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-xfucWH4A7_4/TpUm28woV5I/AAAAAAAAAUg/WjjObjHZccw/s72-c/lenna-bw.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-5439750597128446297</id><published>2011-10-10T13:43:00.009+10:30</published><updated>2011-10-12T16:58:34.936+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='graphics'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Wavelet Denoise</title><content type='html'>As a test routine for some low-level code I threw together a little test harness of a complex wavelet de-noise algorithm.&lt;br /&gt;&lt;br /&gt;It was based on some papers and demo code from &lt;a href="http://taco.poly.edu/WaveletSoftware/"&gt;this link&lt;/a&gt; (which appears to be dead now ... and has been for some time at that).  It's just using a very simple threshold-and-scale of the wavelet coefficients, so apart from the relatively expensive Dual-Tree Complex Wavelet Transform it is simple and cheap to implement.  The 1.7ms reported is the time to forward transform, apply the thresholding, the inverse, and download the (float) image to Java and convert it to a greyscale byte image.  (I know, the screenshot should have been a png, so it's not entirely clear here ...)&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-YrWml7X0L8o/TpKGZ3mfCDI/AAAAAAAAAUQ/m-uAD-KqVHE/s1600/wavelet-denoise.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 214px;" src="http://3.bp.blogspot.com/-YrWml7X0L8o/TpKGZ3mfCDI/AAAAAAAAAUQ/m-uAD-KqVHE/s400/wavelet-denoise.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5661735460538353714" /&gt;&lt;/a&gt;&lt;br /&gt;This has nothing to do with what i'm working on but I thought it looked quite interesting.  It preserves edge detail much better than techniques like a median filter or a Gaussian blur, and introduces fewer artefacts compared to the adaptive blurs i've seen.  According to that now-broken-link, using the complex waveform produces subjectively better results compared to the DWT.&lt;br /&gt;&lt;br /&gt;Perhaps i could use it as a processing step: if you already have the DTCWT coefficients it's a cheap additional process.  Somewhat like doing a convolution in the frequency domain, it's basically free if you're already there.&lt;br /&gt;&lt;br /&gt;I also played a bit with working out a sharpening algorithm on the weekend - I couldn't really find any simple papers: they all relied on adaptive processes, and the results reported didn't seems worth all the effort.  In the end all I did was linearly scaled the coefficients by some made up numbers.  Scale up for the highest frequency components and scale each subsequent wavelet band by 1/2 of the one above.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-hX6g_JhPr-0/TpKMappAVgI/AAAAAAAAAUY/tLUaSS-oi50/s1600/wavelet-sharpen.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 198px;" src="http://3.bp.blogspot.com/-hX6g_JhPr-0/TpKMappAVgI/AAAAAAAAAUY/tLUaSS-oi50/s400/wavelet-sharpen.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5661742071040464386" /&gt;&lt;/a&gt;&lt;p style="text-align: center; font-style: italic; font-size: smaller;"&gt;Unsharp Mask vs Wavelet Sharpen by scaling coefficients with approximately (but not a very good approximation) similar adjustment.  Unsharp Mask is on the left.&lt;/p&gt;The result is pretty much the same as unsharp-mask, but it only takes 1 tuning parameter instead of 2, and subjectively it appears to me to a smidgen less noisy.  But I need to experiment a bit more, one would expect to be able to reduce the noise compared to unsharp mask and I think my low frequency scaling factors are out and it's affecting the tonal quality too much.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-5439750597128446297?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/5439750597128446297/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=5439750597128446297' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5439750597128446297'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5439750597128446297'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/wavelet-denoise.html' title='Wavelet Denoise'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-YrWml7X0L8o/TpKGZ3mfCDI/AAAAAAAAAUQ/m-uAD-KqVHE/s72-c/wavelet-denoise.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-8698772842710169245</id><published>2011-10-08T14:30:00.005+10:30</published><updated>2011-10-08T14:48:04.734+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='images'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Sharpening ImageZ</title><content type='html'>I thought it about time to fix a few little bits and pieces with ImageZ that I actually use ... so I tackled some of that.  I fixed some of the wiener deconvolution code - so that odd-sized images work for instance.  I also tried thoroughly thread-ising it, although I only got a modest performance boost: jtransforms is already using multiple threads for the FFT which is the expensive bit.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-cINNwvZ-DGU/To_M2clo__I/AAAAAAAAAUI/5EZs5Ap4jm0/s1600/imagez-n.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 288px;" src="http://3.bp.blogspot.com/-cINNwvZ-DGU/To_M2clo__I/AAAAAAAAAUI/5EZs5Ap4jm0/s400/imagez-n.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5660968492387074034" /&gt;&lt;/a&gt;&lt;p align="center" style="font-size: smaller; font-style:italic"&gt;Unsharp mask in a feathered mask.  I dialed it up to make it obvious.&lt;/p&gt;&lt;br /&gt;Unsharp mask is something I always find really handy, so I finally coded that up too.  Rather than start with the mess of the Gaussian filter code I already I have i coded another one from scratch.  A bit simpler so I will merge and share the code at some point, or at least put it in a common place.  It also mirrors the edges rather than clamping, which seems to produce a more natural response on the edges.&lt;br /&gt;&lt;br /&gt;There are still a couple of things I use the gimp for that i'd rather not have to, but I guess that can wait for another day.&lt;br /&gt;&lt;br /&gt;I really need to get out of the house this weekend, but i've pretty much pulled up all the weeds, it's been raining enough to water the garden, and the neighbours were using a chainsaw this morning.  So I just found myself stuck at the computer again ... and I might watch the rugby on soon too.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-8698772842710169245?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/8698772842710169245/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=8698772842710169245' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/8698772842710169245'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/8698772842710169245'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/sharpening-imagez.html' title='Sharpening ImageZ'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-cINNwvZ-DGU/To_M2clo__I/AAAAAAAAAUI/5EZs5Ap4jm0/s72-c/imagez-n.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-563539760978877791</id><published>2011-10-08T02:04:00.005+10:30</published><updated>2011-10-08T03:41:15.823+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><title type='text'>Java v OpenCL/CPU</title><content type='html'>I've been using the AMD CPU driver a bit for debugging and testing: i never really considered it for performance but for various reasons late tonight I ended up poking around with a simple routine and wondered how it compared.&lt;br /&gt;&lt;br /&gt;At first I thought i'd discovered a disaster, but that's because I wasn't initialising the data: too many non-normal floating point operations slowing it down significantly.  Oops, glad I checked that before posting.  Although it's getting late so who knows what else I may have stuffed up.&lt;br /&gt;&lt;br /&gt;I was testing using a simple matrix multiply, a 4096x4096 matrix stored in row-major order, multiplied by a 4096 row column-vector.  It isn't something i'm in any need of, but after poking around &lt;a href="http://www.bealto.com/"&gt;this site which i've read a few times&lt;/a&gt;, and with nothing on TV I decided to play around a bit.  Then after exhausting my interest on the GPU I tried the CPU version - I was originally going to see if just doing it locally with the CPU driver would be quicker than a device copy and back, but it isn't, the GPU is still 5-10x faster.&lt;br /&gt;&lt;br /&gt;I tested 4 implementations:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;OpenCL written for a CPU target using float types, one work-group and one work-item per row, 4096 work groups&lt;br /&gt;&lt;li&gt;OpenCL using float4 types, same&lt;br /&gt;&lt;li&gt;Java, single threaded&lt;br /&gt;&lt;li&gt;Java, using a ThreadPoolExecutor w/ 12 threads, 32 jobs.&lt;br /&gt;&lt;/ol&gt;&lt;pre&gt; Code             Time (s)&lt;br /&gt;  Java single      1.5&lt;br /&gt;  Java pool        0.39&lt;br /&gt;  OpenCL float     0.43&lt;br /&gt;  OpenCL float4    0.37&lt;/pre&gt;&lt;br /&gt;So I had to resort to float4 types to beat the thread pool code, and then only just.  It's kind of debatable as to which is easier to write: the Java code must explicitly deal with the range allocation and job launching.  But then it's all built-in, and doesn't require a different language, runtime, interface and foreign memory management ... and one that's prone to crashing with zero information, and otherwise and also excruciatingly difficult to debug at that.  Ok scratch that: the Java clearly wins here.&lt;br /&gt;&lt;br /&gt;One can either conclude that the AMD compiler is a bit below-par to start with (mostly likely true), and only by using vectorised code that it was able to beat the Java.  Or perhaps that the hotspot compiler is rather good at this particular problem (again, most likely true), and is possibly using SSE opcodes to implement the loop too.  Not that SSEn really seems to add much of a boost in general apart from a few extra registers - it's not like on an SPU where vectorised code can be 10x faster than scalar.&lt;br /&gt;&lt;br /&gt;I had until this point thought of the CPU drivers for OpenCL providing a sort of 'portable assembly language' for higher level languages, but if you have a decent compiler already it doesn't seem worth it - at least for some problems.&lt;br /&gt;&lt;br /&gt;I suppose another implementation might do better; but you're still stuck with a pretty hostile debugging environment and if you're after performance you'll be using a GPU anyway.  So about all it seems useful for is debugging/verifying code.  Given that, perhaps it would be useful to add more checking in the compiled code to help with debugging rather than worrying about performance ... Unlike C, OpenCL has a much simpler memory model for which accurate and full run-time address-range-checking can be ?easily? added.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-563539760978877791?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/563539760978877791/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=563539760978877791' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/563539760978877791'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/563539760978877791'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/java-v-openclcpu.html' title='Java v OpenCL/CPU'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-5434528984203454545</id><published>2011-10-06T15:01:00.005+10:30</published><updated>2011-10-07T13:37:26.389+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><title type='text'>Images vs Arrays 4</title><content type='html'>&lt;b&gt;Update 7/10/11: I uploaded the array convolution generator to &lt;a href="http://code.google.com/p/socles/"&gt;socles&lt;/a&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;And so it goes ...&lt;br /&gt;&lt;br /&gt;I've got a fairly convoluted convolution algorithm for performing a complex wavelet transform and I was looking to re-do it.  Part of that re-doing is to move to using arrays rather than image types.&lt;br /&gt;&lt;br /&gt;I got a bit side-tracked whilst revisiting convolutions again ... I started with the generator from socles for separable convolution and modified it to work with arrays too.  Then I tried a couple of ideas and timed a whole bunch of runs.&lt;br /&gt;&lt;br /&gt;One idea I wanted to try was using a rolling buffer to reduce the memory load for the Y convolution.  I also wanted to see if using more work-items in a local workgroup to simplify the local memory load would help or hinder.  Otherwise it was pretty much just getting an array implementation working.  As is often the case I haven't fully tested these actually work, but i'm reasonably confident they should as i fixed a few bugs along the way.&lt;br /&gt;&lt;h2&gt;The candidates&lt;/h2&gt;&lt;dl&gt;&lt;dt&gt;convolvex_a&lt;dd&gt;This is a simple implementation which uses local memory and a work-group size of 64x4.  128x4 words of data are loaded into the local memory, and then 64x4 results are generated in parallel purely from the local memory.&lt;br /&gt;&lt;br /&gt;&lt;dt&gt;convolvey_a&lt;dd&gt;This uses no local memory, and just steps through the addresses vertically, producing 64x4 results concurrently.  As all memory loads are coalesced it runs quite well.&lt;br /&gt;&lt;br /&gt;&lt;dt&gt;convolvex_b&lt;dd&gt;This version tries to use extra work-items just to load the memory, after wards only using 64x4 threads.  In some testing I had for small jobs this seemed to be a win, but for larger jobs it is a big hit to concurrency.&lt;br /&gt;&lt;br /&gt;&lt;dt&gt;convolvey_b&lt;dd&gt;This version uses a 64x4 `rolling buffer' to cache image values for all items in the work-group.  For each row of the convolution, the data is loaded once rather than 4x.&lt;br /&gt;&lt;br /&gt;&lt;dt&gt;imagex, imagey&lt;dd&gt;Is from the socles implementation in ConvolveXYGenerator which uses local memory to cache input data.&lt;br /&gt;&lt;br /&gt;&lt;dt&gt;simplex, simpley&lt;dd&gt;Is from the socles implementation in ConvolveXYGenerator which relies on the texture cache only.&lt;br /&gt;&lt;br /&gt;&lt;dt&gt;convolvex_a(limit)&lt;dd&gt;Is a version of convolvex_a which attempts to only load the amount of memory it needs, rather than doing a full work-group width each time.&lt;br /&gt;&lt;br /&gt;&lt;dt&gt;convolvex_a(vec)&lt;dd&gt;Is a version of convolvex_a which uses simple vector types for the local cache, rather than flattening all access to 32-bits to avoid bank conflicts.  It is particularly poor with 4-channel input.&lt;br /&gt;&lt;/dl&gt;&lt;br /&gt;The array code implements CLAMP_TO_EDGE for source reads.  The image code uses a 16x16 worksize, the array code 64x4.  The image data is FLOAT format, and 1, 2, or 4 channels wide.  The array data is float, float2, or float4.  Images and arrays represent a 512x512 image.  GPU is Nvidia GTX 480.&lt;br /&gt;&lt;h2&gt;Results&lt;/h2&gt;The timing results - all timings are in micro-seconds as taken from &lt;code&gt;computeprof&lt;/code&gt;.  Most were invoked for 1, 2, or 4 channels and a batch size of 1 or 4.  Image batches are implemented by multiple invocations.&lt;br /&gt;&lt;pre&gt;                        batch=1                 batch= 4&lt;br /&gt;channels                1       2       4       1       2       4&lt;br /&gt;&lt;br /&gt;convolvex_a             42      58      103     151     219     398&lt;br /&gt;convolvey_a             59      70      110     227     270     429&lt;br /&gt;&lt;br /&gt;convolvex_b             48      70      121     182     271     475&lt;br /&gt;convolvey_b             85      118     188     327     460     738&lt;br /&gt;&lt;br /&gt;imagex                  61      77      110     239     303     433&lt;br /&gt;imagey                  60      75      102     240     301     407&lt;br /&gt;&lt;br /&gt;simplex                 87      88      169&lt;br /&gt;simpley                 87      87      169&lt;br /&gt;&lt;br /&gt;convolvex_a (limit)     44      60      95      160     220     366&lt;br /&gt;convolvex_a (vec)               58      141&lt;/pre&gt;&lt;br /&gt;&lt;h2&gt;Thoughts&lt;/h2&gt;&lt;ul&gt;&lt;li&gt;The rolling cache for the y convolution is a big loss.  The address arithmetic and need for synchronisation seems to kill performance.  So much for that idea.  I guess there just isn't enough work to do each loop to make it work it (it only requires a single mad per thread).&lt;br /&gt;&lt;br /&gt;&lt;li&gt;Using more threads for loading, then dropping back when doing arithmetic is also a loss for larger problems since it limits how many groups of workgroups can execute on an SM.&lt;br /&gt;&lt;br /&gt;&lt;li&gt;Trying to reduce the memory accesses to only those required slows things down until you hit 4 element vectors.  I guess for float and float2 the cached reads are effectively free, whereas the divergent branch is not.&lt;br /&gt;&lt;br /&gt;&lt;li&gt;Even with the texture cache, images benefit significantly from using a local cache.&lt;br /&gt;&lt;br /&gt;&lt;li&gt;Even with the local cache, images trail the array implementation - until one processes 4-element vectors, in which case they are even stevens for single images.  &lt;br /&gt;&lt;br /&gt;&lt;li&gt;Arrays can also be batched - processing 'n' separate images concurrently.  This adds a slight extra benefit as it can more fully utilise the SM cores, and reduces the need for extra host interaction.  For smaller problems this could be important although this problem size is already giving the GPU a good sized workout so the differences are minimal.&lt;br /&gt;&lt;br /&gt;&lt;li&gt;Using single-channel data is under-utilising the GPU by quite a bit.&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;When I get time and work out how i want to do it i'll drop the array code into socles.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-5434528984203454545?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/5434528984203454545/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=5434528984203454545' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5434528984203454545'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5434528984203454545'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/images-vs-arrays-4.html' title='Images vs Arrays 4'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-1992369120093870826</id><published>2011-10-01T08:56:00.003+09:30</published><updated>2011-10-01T10:43:00.986+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><title type='text'>Images vs Arrays 3</title><content type='html'>So i've been working on some code that works with lots of 2d array data: which gives me the option of using arrays or images.&lt;br /&gt;&lt;br /&gt;And ... arrays won out: for simple memory access and writing they are somewhat faster than using images.  And that's before you add the ability to batch-process: with images you're pretty much stuck with having to pass each one at a time and only pack up to 4 values in each element (3D image writes are not supported on my platform atm).  With arrays you can process multiple 2D levels at once, or even flatten them if they are element-by-element - which can allow you to better fit the problem to the available CUs.&lt;br /&gt;&lt;br /&gt;In some cases the improvements were dramatic where a lot of writes to different arrays were required (but the writes were otherwise independent).&lt;br /&gt;&lt;br /&gt;Anyway, one particular area I thought images would still be a noticeable win is with some interpolation code I had to implement.  I need to do fixed power of 2 scaling up and down.  Apart from the bi-linear interpolation 'for free', there is also an interesting note in graphics gems 2 about using the bi-linear interpolation of the texture unit to perform bi-cubic interpolation using only 4 texture fetches rather than 16.&lt;br /&gt;&lt;br /&gt;So I ran some tests with both an image and array implementation of the following algorithms:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Bi-linear interopolation.&lt;br /&gt;&lt;li&gt;Fast Bi-cubic using the graphics gems algorithm with a 64-element lookup table (I found the lookup-table version significantly faster than the calculated one).&lt;br /&gt;&lt;li&gt;Bi-cubic using 64-element lookup tables generated from the convolution algorithm in wikipedia.&lt;br /&gt;&lt;/ol&gt;In both cases I was using float data, a 512x512 image, and 4x scaling in X and Y, and the numbers are in uS from the Nvidia profiler.  The array implementation is doing CLAMP_TO_EDGE.&lt;br /&gt;&lt;br /&gt;The results were quite interesting.&lt;br /&gt;&lt;pre&gt;                        Image           Array&lt;br /&gt;bi-linear               40              &lt;b&gt;36&lt;/b&gt;&lt;br /&gt;fast bi-cubic           &lt;b&gt;56&lt;/b&gt;              79&lt;br /&gt;table bi-cubic          106             &lt;b&gt;63&lt;/b&gt;&lt;/pre&gt;&lt;br /&gt;With this sort of regular access, the array version of the bi-linear interpolation is actually slightly faster than the image version, although they approach each other as the scale approaches 1.  This is a bit surprising.&lt;br /&gt;&lt;br /&gt;Images do win out for bi-cubic interpolation, but the array version isn't too far off.&lt;br /&gt;&lt;br /&gt;And in either case, the bi-cubic interpolation is really fairly cheap: only about 1.5x the cost of bi-linear which is 'pretty cool' considering how much more work is being done.&lt;br /&gt;&lt;br /&gt;I also started to investigate a bi-cubic interpolator that uses local memory to cache the region being processed by the local work-group.  Since the actual memory lookups are very regular and the block will always access at most worksize+3 elements of data (for scaling=1) it seemed like a good fit.  I just tried a single 64x1 workgroup and managed around 60uS with some slightly-broken code: so perhaps the gap could be closed further.&lt;br /&gt;&lt;br /&gt;Actually one problem I have is a little more complicated than this anyway: the samples I need to work on are not the base samples, but offset by half a pixel first to produce N+1 of them.  With arrays I can use local memory to cache this calculation without having to either run a separate step or do many more lookups: so in this case it will almost certainly end up faster than the image version and I will have to get that local array version working.&lt;br /&gt;&lt;br /&gt;For float4 data the images are only about 1.5x faster for this interpolation stuff: which for my problems is not enough to make up for the slower direct access.  And the bicubic resampling is also 2-3 slower than the bi-linear, the amount of extra arithmetic is catching up.&lt;br /&gt;&lt;h2&gt;Conclusions&lt;/h2&gt;Well, about all I conclude is that Nvidia's OpenCL implementation sucks at texture access.  I've looked at some of the generated code and each image lookup generates a large chunk of code that appears to be a switch statement.  For very big problems most of this can be hidden with overlapped processing but for smaller problems it can be fairly significant.  I'm surprised that they, or OpenCL doesn't have some way of telling the compiler that a given &lt;code&gt;image2d_t&lt;/code&gt; is always a specific type: the access could be optimised then. FWIW I'm using a driver from a few months ago.&lt;br /&gt;&lt;br /&gt;Also I guess: the global memory cache isn't too bad if you have a good regular memory access pattern.  Even optimised code that resulted in 4 simple coalesced global memory accesses per thread vs 16 was only a slight improvement.&lt;br /&gt;&lt;br /&gt;Of course the other conclusion is that it's a fairly simple problem and no amount of 'cache' optimisation will hide the fact that at some point you still need to go to main memory, for the same amount of data.&lt;br /&gt;&lt;br /&gt;I should really do some timings on AMD HW for comparison ... but the computer is in the next room which is also cold and dark.&lt;br /&gt;&lt;h2&gt;Final Thought&lt;/h2&gt;If you really are doing image stuff with affine transformations and so on, then images are going to win because the access pattern will be regular but it wont be rectangular.  The data-types available also match images.&lt;br /&gt;&lt;br /&gt;But for scientific computing where you are accessing arrays, images are not going to give you any magical boost on current hardware and can sometimes be more difficult to use.  They also add more flexible memory management (e.g. i can use the same memory buffer for smaller or multiple images) and the ability to batch in the 3rd dimension.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-1992369120093870826?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/1992369120093870826/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=1992369120093870826' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1992369120093870826'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1992369120093870826'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/10/blog-post.html' title='Images vs Arrays 3'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-2866064825655077092</id><published>2011-09-26T13:19:00.003+09:30</published><updated>2011-09-26T13:59:17.766+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>LU Decomposition 2</title><content type='html'>So I finally needed to get that matrix solver working for work ... and revisited the LU decomposition code I had played with a few weeks ago.&lt;br /&gt;&lt;br /&gt;It turns out the code was broken, first a simple typo, and then a deeper problem: I hadn't noticed that the loop which multiplies out the rows depends on earlier results.  Confusingly of course this worked just fine using a CPU driver since the work items within a work-group are actually executed in serial.&lt;br /&gt;&lt;br /&gt;So I had to come up with another solution.&lt;br /&gt;&lt;br /&gt;First I wrote a bit of code that just printed out the calculations actually being performed.&lt;br /&gt;&lt;br /&gt;For column 0 this amounted to a no-op.&lt;br /&gt;&lt;br /&gt;For column 1, it was some edge cases, then something like:&lt;br /&gt;&lt;pre&gt;  for i : 0 to n&lt;br /&gt;    col 1 [i] -= col 0 [i] * = &lt;b&gt;col 1 [ 0 ]&lt;/b&gt;&lt;/pre&gt;&lt;br /&gt;For column 2, edge cases plus:&lt;br /&gt;&lt;pre&gt;  for i : 0 to n&lt;br /&gt;    col 2 [ i ] -= col 0 [ i ] * &lt;b&gt;col 2 [ 0 ]&lt;/b&gt; + col 1 [ i ] * &lt;b&gt;col 2 [ 1 ]&lt;/b&gt;&lt;/pre&gt;&lt;br /&gt;And so on.&lt;br /&gt;&lt;br /&gt;As the emboldened calculations depend on a previous iteration of the same loop (for n=1 in case 1, and n=2 in case 2), each column result cannot be calculated independently.&lt;br /&gt;&lt;br /&gt;Since the amount of calculation is small, using the shared memory to propagate the partial results didn't seem viable, so instead I simply calculate the required previous values manually for each column for all threads.  I fiddled with the code I had which printed out the calculations and turned it into a code generator to expand all loops: it's only a 6x6 matrix so it isn't very much code.  For the edge cases I use some logic that zeros out the multiplicands so the entire algorithm is completely branchless.&lt;br /&gt;&lt;br /&gt;For example, column 2 is calculated using:&lt;br /&gt;&lt;pre&gt; tmp0 = getlu(lu, 2);&lt;br /&gt; tmp1 = getlu(lu, 8);&lt;br /&gt; tmp1 -= getlu(lu, 6) * tmp0;&lt;br /&gt; tmp0 = cid &gt;= 1 ? tmp0 : 0;&lt;br /&gt; tmp1 = cid &gt;= 2 ? tmp1 : 0;&lt;br /&gt; v = getlu(lu, cid * 6 + 2);&lt;br /&gt; v -= getlu(lu, cid * 6 + 0) * tmp0;&lt;br /&gt; v -= getlu(lu, cid * 6 + 1) * tmp1;&lt;br /&gt; barrier(CLK_LOCAL_MEM_FENCE);&lt;br /&gt; putlu(lu, cid * 6 + 2, v);&lt;br /&gt; barrier(CLK_LOCAL_MEM_FENCE);&lt;br /&gt; pivotexchange(lu, piv, 2, cid);&lt;br /&gt; barrier(CLK_LOCAL_MEM_FENCE);&lt;/pre&gt;&lt;br /&gt;Where &lt;code&gt;cid&lt;/code&gt; is the column id (get_local_id(0) % 6).  I use macros to access the array since I am also flattening it out to avoid (or at least reduce) bank conflicts.  The barriers are needed since some of the rows span wave-fronts: but in any event only add 3% or so to the processing time.&lt;br /&gt;&lt;br /&gt;I've had to write code more than once to understand the data pattern of an algorithm: multi-dimensional array access is bad enough to visualise without adding multi-dimensional processing elements as well.  In some cases I've managed to remove huge amounts of redundant calculations this way - matlab seems to encourage code that has very poor data flow for example.&lt;br /&gt;&lt;br /&gt;I otherwise stuck to the same thread topology: 6 threads work on each 6x6 matrix together, and 192 threads in the local work group for a total of 32 matrices.&lt;br /&gt;&lt;br /&gt;For 4096 solutions of the same 6x6 matrix (i.e. best-case branch divergence), it takes 26uS on the GTX 480.  That seems reasonable enough to me.  Although I still have a small number of bank conflicts I think this result is good enough for an early minute for today.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-2866064825655077092?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/2866064825655077092/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=2866064825655077092' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2866064825655077092'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2866064825655077092'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/09/lu-decomposition-2.html' title='LU Decomposition 2'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-1448455457761279065</id><published>2011-09-15T08:28:00.005+09:30</published><updated>2011-09-15T09:32:41.281+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='rants'/><title type='text'>Control the Roll</title><content type='html'>I think I hit a bug in the NVidia OpenCL compiler today, and lost a bit of faith in compilers in general ..&lt;br /&gt;&lt;br /&gt;What was a fairly small routine with a simple loop - wanted to use 63 registers/work item.  No matter how i tried to unroll the loop using #pragma unroll, re-arrange the work to use vectors or not, and so on.&lt;br /&gt;&lt;pre&gt;// local group size = 16, 16, 1&lt;br /&gt;kernel void&lt;br /&gt;somefunc(..., constant float *f0b, ...) {&lt;br /&gt;    local float *localdata[];&lt;br /&gt;&lt;br /&gt;    ... load local data ...&lt;br /&gt;&lt;br /&gt;    for (int i=0;i&amp;lt;9;i++) {&lt;br /&gt;        float a0 = localdata[i*2];&lt;br /&gt;        float a1 = localdata[i*2+1];&lt;br /&gt;        ...&lt;br /&gt;        v0 += f0a[i*2] * a0 + f1a[i*2] * a1;&lt;br /&gt;        v1 += f0b[i*2] * b0 + f1b[i*2] * b1;&lt;br /&gt;        v2 += f0a[i*2+1] * a0 + f1a[i*2] * a1;&lt;br /&gt;        v3 += f0b[i*2+1] * a0 + f1b[i*2] * b1;&lt;br /&gt;    }&lt;br /&gt;}&lt;/pre&gt;Looking at the intermediate code it had about a thousand(!) redundant register moves to/from other registers.  For the small problem I had it was taking about 100uS which probably wouldn't have bothered me apart from the weird compiler output.&lt;br /&gt;&lt;br /&gt;So I removed the loop entirely by hand, using C macros to implement each step.&lt;br /&gt;&lt;br /&gt;Result: 73uS &amp;amp; 21 registers.&lt;br /&gt;&lt;br /&gt;And the intermediate code was much smaller and more compact.&lt;br /&gt;&lt;br /&gt;NVidia's compiler seems to do a pretty crappy job with vectors in any event, the vector version was even worse - half the speed of a scalar version - around 200uS.  It's nor normally this extreme but it seems it's almost always faster not to use vector code.  It would also (only sometimes!) hang for 20 seconds or more whilst compiling this file, and these changes fixed that too.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-1448455457761279065?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/1448455457761279065/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=1448455457761279065' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1448455457761279065'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1448455457761279065'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/09/control-roll.html' title='Control the Roll'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-6476703954655272348</id><published>2011-09-15T04:56:00.003+09:30</published><updated>2011-09-15T06:16:10.141+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><title type='text'>Aparapi freed</title><content type='html'>AMD finally decided to &lt;a href="http://blogs.amd.com/developer/2011/09/14/i-dont-always-write-gpu-code-in-java-but-when-i-do-i-like-to-use-aparapi/%20"&gt;release Aparapi as free software&lt;/a&gt; on &lt;a href="http://code.google.com/p/aparapi/"&gt;google code&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;This is an interesting project that basically targets a GPU device for executing Java bytecode.  With some fairly obvious heavy limitations ...&lt;br /&gt;&lt;br /&gt;I wont be rushing to convert any of my code over, but when I get time I will have to have a look at it.  Maybe it would be useful for some prototyping - although I find JOGL a real cinch to use and I quite like C, so it might not be for me.&lt;br /&gt;&lt;br /&gt;For doing mathematical calculations, C is simply a nicer language than Java too.  With Java you're forced to use simple arrays for performance, and the lack of a primitive complex type is a real pita.&lt;br /&gt;&lt;br /&gt;(i.e. I think I've just talked myself out of it: I also love using local memory and so on).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-6476703954655272348?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/6476703954655272348/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=6476703954655272348' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6476703954655272348'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6476703954655272348'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/09/aparapi-freed.html' title='Aparapi freed'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-842865551414032441</id><published>2011-09-14T06:50:00.005+09:30</published><updated>2011-09-14T09:46:45.672+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><title type='text'>Shared Bank Conflicts</title><content type='html'>So yesterday I was working on some code for a wavelet transform - essentially a bunch of convolutions performed in a specific way.&lt;br /&gt;&lt;br /&gt;One routine I have does 4 interleaved convolutions simultaneously, and had a large number of shared bank conflicts - 200% or so.  So I spent quite a while trying to remove them.  I got rid of most of them - but for some weird reason I still get about 14% and I ran out of day (and patience) to work out what was going on there.&lt;br /&gt;&lt;br /&gt;I find this kind of problem quite tricky - trying to juggle dozens of sets of numbers in your head and still come up with something that works.  I've developed a few rules of thumb but I still haven't done it often enough to make it easy.&lt;br /&gt;&lt;br /&gt;But for all that effort I got a very minor performance improvement: barely 2%.  From 70uS, to 67uS kernel time.  Hardly seemed worth it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-842865551414032441?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/842865551414032441/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=842865551414032441' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/842865551414032441'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/842865551414032441'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/09/shared-bank-conflicts.html' title='Shared Bank Conflicts'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-2589780459672060182</id><published>2011-09-10T12:03:00.006+09:30</published><updated>2011-09-26T14:00:12.012+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><title type='text'>Masked Loops &amp; LU Decomposition</title><content type='html'>&lt;b&gt;Update 26/11/11: So it turns out the timing is completely bogus because the code not only had a typo in it - which probably allowed the loop to be optimised away - the algorithm was incorrect as well.  So I have a feeling this is all just wrong now ... That's what I get for timing without validating first.&lt;br /&gt;&lt;br /&gt;I have since re-coded this algorithm, and I think I have something that works.  I resorted to building a code generator and manually unrolling all the loops using branchless code. For 4096 6x6 matrices it takes 28uS to perform the LU decomposition (they're the same matrix again).&lt;br /&gt;&lt;br /&gt;&lt;a href="http://a-hackers-craic.blogspot.com/2011/09/lu-decomposition-2.html"&gt;Also see a follow-up on the LU decomposition problem&lt;/a&gt;.&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I've been having trouble getting some code going for work so I thought i'd have a 'break' and visit porting some of the bits and pieces to OpenCL.&lt;br /&gt;&lt;br /&gt;The LU decomposition seemed obvious.&lt;br /&gt;&lt;br /&gt;Not because I really intended to use it, I thought i'd just try an almost direct translation to OpenCL to start with. All I did was have each thread work on a full solution independently.  I used local memory for the column cache, but that was about it.&lt;br /&gt;&lt;br /&gt;Fairly impressed - even disregarding all possible optimisations and memory issues, it only took 220uS (4096, 6x6 matrices).  It can no doubt be made to run faster, but I was so surprised I almost considered just leaving it at that - relative to the other operations it is 'fast enough' ... but of course I didn't.  (I was also solving the same matrix copied 4096 times, so it was 'best case' wrt thread divergence).  I guess it isn't really so surprising though - apart from the poor memory access pattern the code path is consistent across threads.&lt;br /&gt;&lt;br /&gt;I did a bit more work the serial version before tackling the parallel: e.g. by copying the whole matrix to LS first, and telling the compiler to unroll the loops I got it down to 118uS.&lt;br /&gt;&lt;br /&gt;Anyway, along the way to parallising this I came across an interesting and not entirely obvious optimisation.  It was to use the sort of result-masking you need when vectorising algorithms, but just for a simple loop.&lt;br /&gt;&lt;br /&gt;Part of one inner loop the algorithm has this construct:&lt;br /&gt;&lt;pre&gt;#define m 6&lt;br /&gt;#define n 6&lt;br /&gt;&lt;br /&gt;    int kmax = min(i, j);&lt;br /&gt;    float s = 0.0f;&lt;br /&gt;&lt;br /&gt;    for (int k = 0; k &amp;lt; kmax ; k++) {&lt;br /&gt;        s += LU[i*n+k] * LU[k*n+j];&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    LU[i*n+j] = LU[i*n+j] -=s;&lt;/pre&gt;&lt;br /&gt;In this case, 6 threads working together on this problem and each are working on 1 row each. The i index is thus the matrix-local work-item, and j is the current column.&lt;br /&gt;&lt;br /&gt;The problem with this code is the loop indexing is unknown at compile time, so the address calculations need to be done on the fly and the loop can't be unrolled very efficiently. And although the compiler could do what I do below, it seems a bridge too far at present.&lt;br /&gt;&lt;br /&gt;So I changed the loop to this:&lt;br /&gt;&lt;pre&gt;    int kmax = min(i, j);&lt;br /&gt;    float s = 0.0f;&lt;br /&gt;&lt;br /&gt;    for (int k = 0; k &lt; n ; k++) {&lt;br /&gt;        float v = LU[i*n+k] * LU[k*n+j];&lt;br /&gt;&lt;br /&gt;        s += k &lt; kmax ? v : 0;&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    LU[i*n+j] = LU[i*n+j] -=s;&lt;/pre&gt;&lt;br /&gt;And even though it appears to be doing more work - things just don't work like that on a GPU. Each 'thread' is still executing all parts of all loops anyway (or at best, sitting idle waiting for the longest running one to finish).&lt;br /&gt;&lt;br /&gt;This simple change lead to a 25% improvement in the overall execution time in my test case. &lt;br /&gt;&lt;br /&gt;My 'final' code executes in 38uS (I haven't verified it works!  So this number might be nonsense!).  And I still have further to go - the way I access memory isn't coalesced, I also have a lot of local-store bank conflicts to nut out.&lt;br /&gt;&lt;br /&gt;So, assuming the code works, maybe that original 220uS wasn't so hot afterall.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-2589780459672060182?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/2589780459672060182/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=2589780459672060182' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2589780459672060182'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2589780459672060182'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/09/masked-loops-lu-decomposition.html' title='Masked Loops &amp; LU Decomposition'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-4973764564794534254</id><published>2011-09-07T14:53:00.001+09:30</published><updated>2011-09-07T15:14:55.932+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='rants'/><title type='text'>Fixing what isn't really broken.</title><content type='html'>Blah, so Google have decided they're going to mess up another of their products for me.&lt;br /&gt;&lt;br /&gt;I've already stopped using the web interface to gmail - I just use pop and thunderbird - and now they're playing with blogger.&lt;br /&gt;&lt;br /&gt;Blogger's existing interface is pretty crap - but it's simple and it's fast and easy to use.  But the 'improvements' hardly seem improvements to me.&lt;br /&gt;&lt;h2&gt;Harder&lt;/h2&gt;First, the composer.  About the only 'good' thing is that the editing area is bigger - not much difference to me - and that it is a fixed size - that's the nicest bit.&lt;br /&gt;&lt;br /&gt;But it's now more difficult to attach labels to posts as you need to click on the labels tab first - it seems like a simple thing, but the old interface is very simple if you use a few common labels most of the time.  The post options 'pane' in general is pretty pants, it somehow manages to take up about 4x as much space as the previous one while only showing 1/4 as much information at a time.&lt;br /&gt;&lt;br /&gt;They've broken the 'preview' feature - ok, it was always fairly broken - but now it's a full stylised preview which takes quite a while to load on a separate page/tab.  The old in-line quick preview was much more useful to me than the full preview just to review the content of the text when proof-reading and editing and trying to get the paragraph white-space right.  What used to take no time now takes a second and a tab switch.&lt;br /&gt;&lt;h2&gt;Bigger&lt;/h2&gt;The stats pages now no longer fit in my browser, and they seem to take longer to load.  Too many annoying tooltips and popups as well.&lt;br /&gt;&lt;br /&gt;The settings/dashboard is weird in general - everything is double-spaced, and a huge chunk of the top of the window is dedicated to a fixed area that doesn't really add anything useful by being fixed.  For some reason people seem to love this kind of crap but it just gives me the shits.&lt;br /&gt;&lt;br /&gt;For me blogger is just another tab of dozens - not a stand-alone full-screen application.  Everyone seems to want to move away from being able to put stuff in a window which doesn't take up the whole screen - you know, to multi-task?&lt;br /&gt;&lt;br /&gt;Apple had a reason to force full-screen applications - first in macos which had a shit multi-tasking system, and then on iphone/itab since the machines aren't that powerful.  Microsoft did the same with windows - the OS was really just a glorified application switcher.  But I thought those days were long gone ...&lt;br /&gt;&lt;h2&gt;Slower &amp;amp; Hotter&lt;/h2&gt;One reason I dropped gmail is that it was starting to make my laptop hot - firefox was busy all the time doing pointless shit you couldn't see or would rather not.  It will be interesting to see if this new interface on blogger is also heavier than the old one.  Whilst typing this post i've already noticed a bunch of pauses and freezes which I can't recall having occured in the previous incarnation.&lt;br /&gt;&lt;br /&gt;This could be a real deal-breaker for me, although the stats are fun to watch, by far the most time I ever use in blogger is simply writing blog posts.  If that becomes too painful (and I have to say, 1/2 a second cursor pause every 20 seconds gets tiring VERY VERY fast) then I wont be too pleased.  The pause seems to get longer the more you write too ...&lt;br /&gt;&lt;br /&gt;For now i'll go back to the old blogger, and I provided some feedback, for what little that will be worth.  But I know eventually i'll be forced onto something I don't really want.&lt;br /&gt;&lt;br /&gt;(this sort of forced-upgrade stuff is the sort of thing that scares me about firefox's 'no version' plans.  I'm still using an 'old' firefox because the new ones are not to my liking, and in any event aren't packaged for my distro - but at least I can use an old version if I so want).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update&lt;/b&gt; And in a further twist, the 'no i'd rather go back to the old interface' feedback form failed to work.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-4973764564794534254?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/4973764564794534254/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=4973764564794534254' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4973764564794534254'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4973764564794534254'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/09/fixing-what-isnt-really-broken.html' title='Fixing what isn&apos;t really broken.'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-7700168681423640831</id><published>2011-09-07T13:32:00.004+09:30</published><updated>2011-09-07T14:06:06.635+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>Java 2D arrays</title><content type='html'>I had to find a bit of code to solve a set of simultaneous equations for some prototype code i'm working on.&lt;br /&gt;&lt;br /&gt;Having to do this gives me the willies really because linear algebra was never much fun for me ...&lt;br /&gt;&lt;br /&gt;I only have to solve a pretty simple system - 6x6 - and I settled on using &lt;a ="http://math.nist.gov/javanumerics/jama/"&gt;Jama&lt;/a&gt;, mostly because it's a small and simple library.  The code is also fairly clean and I need to eventually port this to OpenCL too.&lt;br /&gt;&lt;br /&gt;The code uses 2-D arrays to store it's matrices, but I know 2-D matrices in Java aren't particularly efficient - they are implemented much the way you would do it in C.  That is an array of pointers which point to the rows of the array.  Thus every 2D access requires 2 array dereferences.  Anyway as I need to port it to OpenCL eventually anyway I tried converting the Matrix and LUDecomposition classes to use linear arrays.  Then I just use simple arithmetic to map 2-D indices onto this linear array (i.e. i*n + j).&lt;br /&gt;&lt;br /&gt;I got pretty much exactly a 2x performance boost from this simple change.  Which is in-line with what I expected but I didn't quite expect it to be so close to 2x.  The memory accesses far out-weigh any arithmetic on a modern CPU, and since 2-D arrays require 2x as many memory accesses (and 2x the range checks i presume), halving the memory accesses required lead to a 2x speed up.  Even though the twin-array access was replaced by the more complex multiply and an addition as the index calculation.&lt;br /&gt;&lt;br /&gt;Jama is wildly out of date, and I didn't look at the alternatives which might focus more on performance, but it shows that at least in this case 2-D arrays come at quite a cost.&lt;br /&gt;&lt;br /&gt;Not really looking forward to getting it working in OpenCL either, trying to parallelise it is going to be a bit of a hassle.  Then again maybe the challenge will be refreshing - I need something to spark me up at the moment.&lt;br /&gt;&lt;br /&gt;This whole chunk of work is getting me down a bit - I have a big pile of hard to decipher ('matlabotomised') code to convert before I even get something to test, and then I have to try to remove some of the matlabisms that don't make sense in a real language, or require unnecessary excess memory.  Then I have to get it working.  And once that's done I have to re-do it all again from Java to OpenCL and get that working ... but i'm just not into it these last few weeks.  Lack of sleep mostly (I keep waking at sun-up, I really must exercise), but also some other distractions - a few days of nice weather, family visiting, and so on.&lt;br /&gt;&lt;br /&gt;This is also why I haven't had time to work on any of the other projects - I just don't have the energy.  Lawn is looking good though.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-7700168681423640831?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/7700168681423640831/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=7700168681423640831' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7700168681423640831'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7700168681423640831'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/09/java-2d-arrays.html' title='Java 2D arrays'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-8641355888186982617</id><published>2011-09-07T10:04:00.005+09:30</published><updated>2011-09-07T16:05:31.925+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='rants'/><title type='text'>The problem with teaching abstractions</title><content type='html'>For want of distraction, I've been hanging around some of the OpenCL forums of late.  Boy do they get some ninny questions.&lt;br /&gt;&lt;br /&gt;From people trying to drop plain C (with full stdio and all) into the compiler and expecting it to work, to someone asking if you can 'write functions' in the host language ... (err, I mean seriously.  Maybe the guy is a matlab guru but certainly it isn't that hard to find out about C or whatever host language he's using).&lt;br /&gt;&lt;br /&gt;But those are just the most extreme in the last couple of days.  What is more worrying is just how many people don't seem to understand computer architecture at all - much less a system containing a 'remote' processing device like a GPU.&lt;br /&gt;&lt;br /&gt;Really basic things like cache, registers, stack, memory latency, memory banks &amp; contention, I/O bus latency, call invocation overheads, and so on.  Not to mention the less-obvious but not really more complex ideas that GPU's bring to the table such as memory coalescing, thread divergence/masking (i.e. SIMT branches), local memory, and so on.&lt;br /&gt;&lt;br /&gt;Also, a rough idea of just how fucking fast modern hardware is.&lt;br /&gt;&lt;br /&gt;I presume most of the queries are from students but they just seem to have absolutely NFI what is going on 'under the bonnet' on their shiny new bit of hardware.  e.g. the reason your code is slower on a GPU should be bleedingly obvious before you went anywhere near a compiler.&lt;br /&gt;&lt;br /&gt;Before trying to work with such a complex system, you really need to know some basics of computer system architecture - otherwise none of the api will make any sense, nor will any of the results.&lt;br /&gt;&lt;br /&gt;The manuals are good: just read them.&lt;br /&gt;&lt;br /&gt;Experiment on your own; it's easy, it's fast, it's cheap (it only takes your time, and a student's time isn't worth anything).  You don't learn any-where near as much if you just copy someone else, or forever ask inane questions.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-8641355888186982617?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/8641355888186982617/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=8641355888186982617' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/8641355888186982617'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/8641355888186982617'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/09/problem-with-teaching-abstractions.html' title='The problem with teaching abstractions'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-2558443195430011606</id><published>2011-08-21T14:13:00.005+09:30</published><updated>2011-08-21T15:00:32.642+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='imagez'/><title type='text'>Visual Selections</title><content type='html'>After hitting the bottle with a couple of mates last night I ended up sleeping on the couch (too lazy to make my bed after washing the sheets), woke up about 6:30 feeling a bit ordinary but I thought i'd get some hacking out of the way because I was awake and couldn't really face too much else.&lt;br /&gt;&lt;br /&gt;I continued to work on removing the Piccolo2D stuff - which I completed.  Now i'm just using plain Java2D for the rendering.  Of course, one of the big reasons I did it was so I could zoom without making the handles scale as well ... and of course I forgot to implement that so it all zooms anyway.  No biggy ... at least it seems to work.  And it should be relatively simple to implement it.&lt;br /&gt;&lt;br /&gt;I still need quite a bit of code to implement interesting interface behaviour ... but at least it is no worse than the Piccolo2D stuff was before it.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-kdPlldMx3QE/TlCOmVTXBdI/AAAAAAAAAUA/j6pcLl88LTU/s1600/imagez-m.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 236px;" src="http://3.bp.blogspot.com/-kdPlldMx3QE/TlCOmVTXBdI/AAAAAAAAAUA/j6pcLl88LTU/s400/imagez-m.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5643167122299749842" /&gt;&lt;/a&gt;&lt;br /&gt;As part of that I visited the selection code.  Some time ago I had the idea of being able to show the current selection using a shadow mask - including the current 'feathering' and so on.  So I added that - all 4 lines of code required for that.  Well I think it looks pretty cool, and if you have a fast machine it updates in real-time as you edit the selection even though it uses a Gaussian blur for the feathering.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update: &lt;/b&gt; Since I think there's now enough functionality to move it beyond the simply embarrassing stage, I've packaged the first public alpha release too. See the &lt;a href="http://code.google.com/p/mediaz/downloads/list"&gt;downloads page&lt;/a&gt; on the project, but don't expect too much.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-2558443195430011606?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/2558443195430011606/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=2558443195430011606' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2558443195430011606'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2558443195430011606'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/08/visual-selections.html' title='Visual Selections'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-kdPlldMx3QE/TlCOmVTXBdI/AAAAAAAAAUA/j6pcLl88LTU/s72-c/imagez-m.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-4788105618088824473</id><published>2011-08-20T11:21:00.004+09:30</published><updated>2011-08-20T13:35:09.055+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='imagez'/><title type='text'>Crop Tool</title><content type='html'>I had a few hours to play with this morning and I had another look at the crop tool for ImageZ.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-7m_dz_NJpxw/Tk8tbKiICUI/AAAAAAAAAT4/IdRGResGbxY/s1600/imagez-l.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 236px;" src="http://2.bp.blogspot.com/-7m_dz_NJpxw/Tk8tbKiICUI/AAAAAAAAAT4/IdRGResGbxY/s400/imagez-l.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5642778802825726274" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;It lets you change the top/left/bottom/right edge with the pre-lit 'handles', or drag the whole rectangle around clicking inside the box.  Clicks outside of the box let you drag to select a new bound.  So pretty simple/obvious interface, although I couldn't be bothered implementing the corner handles.&lt;br /&gt;&lt;br /&gt;So I had previously decided that Piccolo2D just doesn't quite fit this application - the need to have user-interface elements which don't scale with the zoom setting was the deal-breaker.  Current code that uses it has some very messy mechanisms to make it work - sometimes.&lt;br /&gt;&lt;br /&gt;The new stuff just uses some custom objects and Java2D to do the rendering and a very flat/simple 'scene graph'.  So far I haven't even added any sort of optimised rendering although I probably will need to.  Although right now it is fast most of the time (not so much when zoomed - but that was the same for the piccolo2d stuff too).&lt;br /&gt;&lt;br /&gt;I also played with a slightly different event model - currently the current tool gets mouse events, but I decided rather than have the tool manage events of control-handles they can do it themselves.  It wasn't much extra code to implement the event routing in the imageview class.&lt;br /&gt;&lt;br /&gt;I think there's still a bit more work for the tool design before i think i'll be finished with it, but at least it feels i'm moving forwards with it.  I will keep an eye on this as I discover what works and what doesn't and eventually clean it all up to a consistent and hopefully simple interface.&lt;br /&gt;&lt;br /&gt;Probably the next thing to look at will be improving the brush mechanics though, or maybe fixing the super-affine tool - perhaps with that I will have enough to drop a jar.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-4788105618088824473?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/4788105618088824473/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=4788105618088824473' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4788105618088824473'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4788105618088824473'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/08/crop-tool.html' title='Crop Tool'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-7m_dz_NJpxw/Tk8tbKiICUI/AAAAAAAAAT4/IdRGResGbxY/s72-c/imagez-l.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-9054347527314200559</id><published>2011-08-19T13:47:00.002+09:30</published><updated>2011-08-19T15:40:54.055+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='biographical'/><title type='text'>Another one bites the dust ...</title><content type='html'>Mum just called to let me know another one of my brothers has died.  It wasn't a surprise - he was literally on his death-bead a few times last year and despite numerous health problems continued to smoke and drink.  Add to that a misspent youth of drug abuse and kleptomania and who knows what else, and it was pretty much a foregone conclusion.  It's surprising perhaps that he even made it to his early 50's in the first place.&lt;br /&gt;&lt;br /&gt;Barely knew him myself - he disappeared for about 10 years in the mid 80s, finally turning up in Perth as a lawyer (which was surprising given his past). I think he always blamed mum for pushing him to get a trade rather than follow educational prospects, but I think he must've fried his brain so much he forgot what a dope-head he was at the time.  Then again, I was a bit too young at the time to really grok what was going on.  Then he more or less circumnavigated the whole country in the years following, having a son with his girlfriend along the way.&lt;br /&gt;&lt;br /&gt;Seems the census collector found him when she returned to pick up the census form (his gf left him a couple of years ago, apparently to futilely chase some young bloke).  Given that the collector (probably) dropped it off a week earlier and it wasn't touched, he may have died some time before that.  Bit sad I suppose.  And it's not like this was in the city either, it was a country town.  Nothing like a bit of community cohesion!  On ya Australia!&lt;br /&gt;&lt;h2&gt;2 down, 7 to go ...&lt;/h2&gt;Still, even if it wasn't unexpected and I barely knew him, death of a sibling is still a strange thing to experience, even the second time around.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-9054347527314200559?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/9054347527314200559/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=9054347527314200559' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/9054347527314200559'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/9054347527314200559'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/08/another-one-bites-dust.html' title='Another one bites the dust ...'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-6803380627253297932</id><published>2011-08-18T11:37:00.004+09:30</published><updated>2011-08-18T12:54:44.401+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>GEGL/OpenCL</title><content type='html'>So apparently a &lt;a href="http://meudepositodeideias.wordpress.com/2011/08/08/opencl-on-gegl-results-up-to-now/"&gt;lad's been working on getting some OpenCL code into GEGL&lt;/a&gt;.  What surprises me is just how slow the result is - and how slow GEGL is at doing the super-simple operation of brightness/contrast even with a CPU.&lt;br /&gt;&lt;br /&gt;Of course, I'm not sure exactly what is being timed here, so perhaps it's timing a lot more than just the mathematics.  Well obviously it has to be, my ageing Pentium-M laptop can do a 1024x1024xRGBA/FLOAT brightness/contrast in about 70ms with simple single-threaded Java code.  So 500ms for the same operation using 'optimised sse2' is including a hell of a lot of extra stuff beyond the maths.  Curiously, the screenshot of the profiler shows 840 'tiles' have been processed, if they are 128x64 as suggested then that is 6MP, not 1MP as stated in the post - in that case 500ms isn't so bad (it isn't great either, but at least it's in the same order).&lt;br /&gt;&lt;br /&gt;I tried posting this to the forum linked to this &lt;a href="http://www.phoronix.com/scan.php?page=news_item&amp;px=OTc5OQ"&gt;phoronix post&lt;/a&gt; but for whatever reason it refused to take the post, so i'll post it here instead.&lt;br /&gt;&lt;hr&gt;&lt;br /&gt;This result is really slow.  Like about 100x off if I have the relative performance of that gpu correct.  Even the CPU timings look suspect - is GEGL really that slow?&lt;br /&gt;&lt;br /&gt;A list of potential bottlenecks:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;the locking stuff sounds overly complex, but maybe that's a gegl requirement&lt;br /&gt;&lt;li&gt;are you timing 1-off allocations which skew the results?&lt;br /&gt;&lt;li&gt;moving single tiles back/and forth/processing them separately (this is a big one)&lt;br /&gt;&lt;li&gt;processing only a single tile per kernel call (this is a really big no-no)&lt;br /&gt;&lt;li&gt;might want to specify the local work-size to ensure the best memory access pattern on the opencl side.  16x16 usually works well for image processes per pixel on a gpu.&lt;br /&gt;&lt;li&gt;PCI latency, related to working with small blobs of data at a time.  This can be completely hidden fairly easily by queueing up more jobs before a synchronisation point (either a clFinish or EnqueueReadBuffer(, true).  Also you need to do a clFlush if you want the work to start while the cpu is still doing something (e.g. queuing up more work).&lt;br /&gt;&lt;li&gt;GEGL design.  I know nothing about it, but if you need to go to the CPU to do synchronisation between each composed operation you may never achieve very good performance.  Ideally you upload data once to the gpu, then do all processing without any cpu synchronisation until the final result is ready.  By default an opencl command-queue is in-order (and no implementation support out of order anyway), so you leverage that as well.  If GEGL can't already handle threads to do a similar parallelisation it might not be ready for opencl either.&lt;br /&gt;&lt;li&gt;GEGL itself.  Since the GEGL CPU timings are so slow (i mean, really really slow) GEGL must be doing a lot more behind the scenes/adding so much overhead that the actual calculations are completely swamped.  If this is 'fixed', then no matter what you do, such processing will always be relatively slow, although as the complexity of the algorithm increases this fixed overhead will matter less.&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;A list of things which can't be bottlenecks:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;PCI bandwidth.  It's just not enough data to matter.&lt;br /&gt;&lt;li&gt;OpenCL kernel - maybe it can be improved with a better work-group-size, but it's so simple it can't really be wrong.&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;Suggestions&lt;br /&gt;&lt;ul&gt;&lt;li&gt;My gut feeling is that you ignore tiles completely on the opencl backend.  Even doing manual cpu-side composition of tiles into aggregate will be fairly cheap compared to synchronous transfers/operations.  Composing operations complicate matters though ...&lt;br /&gt;&lt;li&gt;Don't try to hide too much detail with abstractions.  It usually just makes it harder to know what's really going on (particularly for another coder).&lt;br /&gt;&lt;li&gt;Don't worry too much about comparing such a simple operation with the CPU.  The CPU should already be able to do it at about memory speed, and you're adding PCI copies in-between.  It's the more interesting stuff like convolution or FFT-based algorithms where the GPU will blow it completely out of the water.&lt;br /&gt;&lt;li&gt;Think of the GPU processor as a 'stream' processor.  You want to load it up with a pipeline of operations and keep the pipe stuffed with work.  Waiting for the pipeline to empty before adding more work will kill performance faster than anything else.  This applies at every level - the individual threads, SM's, as well as data blocks.&lt;br /&gt;&lt;li&gt;Might need to do some profiling of the CPU GEGL brightness/contrast implementation.  Something other than the actual calculations is taking most of the time.&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;In the nvidia profiler, look at the 'gpu time width plot' to see when the gpu is actually doing work.  You'll probably see the individual jobs (and memory transfers) take almost no time and it's mostly sitting idle waiting for work from the cpu. It's that idle time which is going to be 99% of the elapsed time which is where you find all the gains at this point.&lt;br /&gt;&lt;br /&gt;Don't even bother looking at the graph you posted - memory transfer time will have to be greater than the processing time since the processing is so simple and the gpu memory bandwidth is so much higher than pci speed.  All you're doing is confirming that fact.  The memory transfer time can mostly be hidden using asynchronous programming techniques anyway, so it is basically irrelevant.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-6803380627253297932?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/6803380627253297932/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=6803380627253297932' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6803380627253297932'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6803380627253297932'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/08/geglopencl.html' title='GEGL/OpenCL'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-7757747712745279070</id><published>2011-08-17T10:09:00.003+09:30</published><updated>2011-08-17T11:46:44.131+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='biographical'/><title type='text'>10K</title><content type='html'>So in a bit over 2 years since I turned on the stats, this blog broke the 10K hit barrier in the last few weeks.  I guess that's nothing particularly to speak of but for what is mostly a bunch of private rants and technical musings it's not insignificant either.&lt;br /&gt;&lt;br /&gt;Although one particular page has the lions share of the hits - and that it continues to do so is interesting in itself.  This is the long and rather rambling post about trying to find a Java FFT library and some abuse about visual studio.  Although it's clearly the Java FFT that people are searching for to find that page!  It shows that someone is doing some scientific programming in Java, which I find interesting.  The only thing I really wish Java had for this was a native complex type - doing anything with complex numbers quickly gets ugly, and even worse if you want some speed.&lt;br /&gt;&lt;br /&gt;Second on the list is the BeagleBoard GS2010 wrap-up post with about half the number of hits.  For such a small community there is quite a lot of interest out there.  Unfortunately work commitments and other interests have pulled me away from spending time with the Beagleboard, which is a bit of a pity.  For the moment all i'm using mine for is for playing internet radio plugged into my stereo.  It's sitting boxless on a coffee table next to the amp and my 'user interface' consists of telnetting to it from my workstation and running mplayer on the command line :)&lt;br /&gt;&lt;br /&gt;The next few 'high hitters' (if you could call them that) are low-level posts on: SSE optimisations (which basically said they don't make nearly as much difference as vector ops on CELL did), OpenCL Images vs Arrays (which I find rather difficult to understand myself, but i think the gist of it is that you have to write code differently but both perform about the same), and Context Switching on ARM.  I would guess the last one may have helped a few students with their assignments ;-) - it doesn't seem to be a topic of general interest.&lt;br /&gt;&lt;h2&gt;Onward and upward&lt;/h2&gt;Like everyone else I'm pretty useless at predicting the future but I can probably take a rough guess at where my interests will take me in the next few years.  I don't have a need for any particular software any-more (beyond what is a yum invocation away), so whatever I work on is only for entertainment (and perhaps a bit for education, but just solving problems for work educates one a great deal).&lt;br /&gt;&lt;br /&gt;I think OpenCL will continue to grow - socles is already my most 'hit' google-code project and the only one anyone ever mailed me about (actually someone did mail me about puppybits).  It isn't really going anywhere at the moment because I can't really think of anything to use it for myself - I have some vague ideas of a video-something application (mediaz/VideoZ), but there is so much to think about and code before it even gets started.  As applications get bigger and more complex, that starting hump is quite a psychological barrier to get over when there are other sources of entertainment competing for my time.  Back to socles though - OpenCL is still a bit of a niche, and Java + OpenCL even more-so, so I'm in no rush to expand it until I can find something to use it for.&lt;br /&gt;&lt;br /&gt;As an aside, I've noticed a worrying trend on the OpenCL forums - which seems to be more afflicted by this than other forums, although I've seen it before elsehwere and it's probably just because i don't tend to hang around forums a lot these days.  And that is this: inexperienced programmers - most likely students, with a very limited command of the English language, posting questions which demonstrate they can't even be bothered to read the manuals (OpenCL has some very good resources available).  And even worse, to paraphrase a comment from the BeagleBoard list, the queries generally amount to to "I'm lazy, can you please do my homework for me?".  Extremely rude and disrespectful and really messes up mailing lists and forums.&lt;br /&gt;&lt;br /&gt;Puppybits ... well that will probably continue to stay on hold.  Unless I take another big break between contracts again and have loads of time to work on it.  Every now and then I have a look to see if there are any simple USB host stacks to snarf to help progress it, but nothing's popped up so far.  Without USB one is severely constrained.  If I ever get the OpenPandora I ordered that might pique my interest in ARM hacking again though.  I have a big bunch of 'zedos' work I never committed which I probably should if only so it doesn't get lost from my backups (I `upgraded' my OS a few months ago and lost my development environment for example).&lt;br /&gt;&lt;br /&gt;mediaz/ImageZ ... is probably of little use to anyone else, but I will keep poking away at it when I have the inclination.  There are a few basic things I need to get sorted out before i'm prepared to drop a jar of it, which I will do at some point.  One is the tool overlay mechanism which i'm refining again as I work on a crop-tool.  Probably a couple of days work.&lt;br /&gt;&lt;br /&gt;jjmpeg ... is already quite useful, although to package it up and polish it off would require a lot more work and time.  This is one of those building blocks I needed for the video application I was thinking about, so now it's to some state of usefulness I can at least entertain the idea of moving forward with that.  Also, if I decide to switch to it for some work code I have it would probably get a bit more of a work-out as well - it's something i'm considering since I can't get xuggle to build for windows (without more time than i'm willing to waste) and it's ffmpeg libraries are getting a bit out of date.  Not to mention tied to 32 bits.&lt;br /&gt;&lt;br /&gt;And i'll keep ranting about bits and pieces, cooking, gardening and other shit.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-7757747712745279070?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/7757747712745279070/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=7757747712745279070' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7757747712745279070'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7757747712745279070'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/08/10k.html' title='10K'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-6494519702979029032</id><published>2011-08-11T21:35:00.006+09:30</published><updated>2011-08-11T22:56:26.167+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Video/Audio Player</title><content type='html'>I just checked in a reasonably complete audio/video player example using jjmpeg.&lt;br /&gt;&lt;p align=center&gt;&lt;img src="http://jjmpeg.googlecode.com/svn/wiki/images/jjmpeg-player-0.jpg"&gt;&lt;/p&gt;&lt;br /&gt;It synchronises the video to the audio if it's there, allows one to seek and pause and so on.  The pause function is a bit crap - it keeps running any queued up data from the decoder - but that's only a fraction of a second.  It uses a JLabel for output via a BufferedImage, which works well enough if the machine is quick.  There are some other problems, but it works reasonably well all things considered.  It's using &lt;a href="http://jogamp.org/joal/"&gt;JOAL&lt;/a&gt; for audio output.&lt;br /&gt;&lt;br /&gt;The code is part of the jjmpegdemos sub-project, and is in the &lt;a href="http://code.google.com/p/jjmpeg/source/browse/trunk/jjmpegdemos/src/au/notzed/jjmpeg/mediaplayer/"&gt;au.notzed.jjmpeg.mediaplayer&lt;/a&gt; namespace.&lt;br /&gt;&lt;br /&gt;This is the one I &lt;a href="http://a-hackers-craic.blogspot.com/2011/06/video-player.html"&gt;mentioned I was working on&lt;/a&gt; 2 months ago, and since it was reasonably complete (and I don't think i'll be working on it again for a little while) I thought it was about time I checked it in.  I have a swathe of stuff for socles I should probably upload at some point too.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-6494519702979029032?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/6494519702979029032/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=6494519702979029032' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6494519702979029032'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6494519702979029032'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/08/videoaudio-player.html' title='Video/Audio Player'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-6249469306562229575</id><published>2011-08-11T12:08:00.003+09:30</published><updated>2011-08-11T12:27:31.396+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='rants'/><title type='text'>matlabotomisation</title><content type='html'>&lt;blockquote&gt;&lt;i&gt;matlabotomisation&lt;/i&gt;&lt;br /&gt;- &lt;b&gt;vb&lt;/b&gt;&lt;br /&gt;To write or modify a matlab or octave script in order to achieve maximum efficiency in processing time.  &lt;b&gt;Thus rendering the algorithm &lt;i&gt;virtually indecipherable&lt;/i&gt; to both mathematicians and software engineers alike.&lt;/b&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Yes, i'm back to reading matlab scripts again - an unfortunately common task when dealing with research from computer scientists.&lt;br /&gt;&lt;br /&gt;matlab (the language) is a really basic scripting language, with a library of routines that make processing mathematical algorithms possible, but not exactly easy.  It isn't something that mirrors the mathematical language very concisely, nor maps easily to procedural languages.  If that were it's only shortcoming it would be bad enough, but it is also really very slow.&lt;br /&gt;&lt;br /&gt;So to get performance out of matlab one has to write code using (multi-dimensional) array types.  Writing a loop which generates results one at a time is far too slow, so instead you generate a table of indices and then write a formulae that uses these indices to generate all results at once.  This can be fairly concise, and it sort of sounds like functional programming or representing mathematics cleanly, but unfortunately it falls well short of this goal and often the code is off generating complex sets of indices which can be confused with it actually doing work.  So you end up with something that might run reasonably quick (for matlab anyway), but is a real brain-ache trying to understand.  It neither matches the mathematics, nor the processing steps the cpu takes to form the result.&lt;br /&gt;&lt;br /&gt;I prefer when the scientist just gives up and writes simple matlab - for one, it makes my life a lot easier, and as a bonus even a trivial Java conversion will run at least an order of magnitude faster.  So it makes me look smarter too!&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-6249469306562229575?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/6249469306562229575/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=6249469306562229575' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6249469306562229575'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6249469306562229575'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/08/matlabotomisation.html' title='matlabotomisation'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-6070159177704456590</id><published>2011-08-07T13:17:00.002+09:30</published><updated>2011-08-07T13:40:35.131+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='imagez'/><title type='text'>OpenRaster, SPI, etc.</title><content type='html'>After poking around ImageZ a bit late last night I thought i'd tackle multi-layer reading/writing.&lt;br /&gt;&lt;br /&gt;So I wrote a writer and eventually a saver for OpenRaster format.  I decided on OpenRaster since it is so simple, and it was pretty much how I was going to write it anyway - only I was going to avoid the XML.  Being a zip file makes things simple om Java too.  It seems to interoperate well enough so far (since I only have 'normal' blend mode working anyway), although if you save layers in greyscale or 16 bit formats from ImageZ and then load/save them from MyPaint, everything is converted to RGBA 8 bit.&lt;br /&gt;&lt;br /&gt;I still need a float format though - I started looking into OpenEXR last year - but that was about when I stopped working on ImageZ for a chunk of time too - but I hit some walls with the test images.  I can't recall where the issue was now though.  This isn't really a high priority.&lt;br /&gt;&lt;br /&gt;Today I thought I'd work on writing an ImageReaderSpi for the format as well - for example since currently OpenRaster files do not display in the open requester.  But i got too side-tracked trying to implement meta-data and other features which in hindsight I probably don't need.  I might revisit it again later with reduced requirements and see if I can get it working.&lt;br /&gt;&lt;br /&gt;Along the way I also played with JAXB XML (de)serialisation which looks pretty nice - as nice as things can get with XML I guess.  In general I try to avoid XML as much as possible because I think it's the phlegm, vomit, and anal leakage of devil's spawn, so this was a pleasant surprise.  No surprise that it wasn't originally an apache project though ...&lt;br /&gt;&lt;br /&gt;Also started work on a crop tool.  This is exposing me once again to issues with the tool overlays, so I should probably think about cleaning that up somehow too.  I'm using piccolo2d at the moment, but the way I have the tools track the current zoom is a right pigs breakfast.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-6070159177704456590?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/6070159177704456590/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=6070159177704456590' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6070159177704456590'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6070159177704456590'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/08/openraster-spi-etc.html' title='OpenRaster, SPI, etc.'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-3515536510651154632</id><published>2011-08-06T12:10:00.006+09:30</published><updated>2011-08-06T21:07:57.317+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediaz'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='imagez'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>mediaz &lt;-- ImageZ</title><content type='html'>I finally uploaded ImageZ to google code, under a new project &lt;a href="https://code.google.com/p/mediaz/"&gt;mediaz&lt;/a&gt;.  I'm pre-empting myself somewhat here, but i'm leaving room should I develop some other tools - e.g. if the VideoZ stuff ever goes anywhere.&lt;br /&gt;&lt;br /&gt;I didn't get around to cleaning up everything I had intended to, so it's well short of being terribly useful, but that's the way it goes I guess.  I didn't really want to spend my Saturday at the computer again, but there's not much else to do - everyone else is out and it's a crappy cold, windy and eventually wet day we're headed for.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update:&lt;/b&gt; I had intended to catch up with a couple of mates for some beer and food this evening but I slept in and then it started pissing down with rain so I ended up stuck inside again (watching Port get totally arse-raped by Collingwood).  Then I ended up playing with ImageZ a bit more and realised i'd sold it a bit short - there is quite a lot of functionality there after-all, even if some big and rather important parts are missing.  I did a bit of hacking on it as well as some house-keeping on the google code page.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-3515536510651154632?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/3515536510651154632/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=3515536510651154632' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/3515536510651154632'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/3515536510651154632'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/08/mediaz-imagez.html' title='mediaz &lt;-- ImageZ'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-5563419803590155763</id><published>2011-08-05T12:18:00.004+09:30</published><updated>2011-08-05T14:27:14.592+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Nvidia opencl 1.1</title><content type='html'>Yay, so NVidia finally released an opencl 1.1 spec driver.  I guess now I should read up more on opencl 1.1 and see if there's anything I can take advantage of - so far It wasn't even on the radar because of their complete lack of support; and i'm happy enough with 1.0 anyway.  I'm not sure this is really enough to restore confidence that OpenCL is a first-class citizen on NVidia hardware - their weekly emails haven't mentioned OpenCL for months.  We're headed for AMD hardware anyway, if only to try alternatives.&lt;br /&gt;&lt;br /&gt;Speaking of AMD, I thought I might try to create a Java binding for the AMD FFT library - I wouldn't mind evaluating it to see if it could replace my current FFT implementation (the apple one, as ported in the jocl demos tree).  Unfortunately it uses some types and interfaces which are tricky to wrap in Java, at least in a way which works independent of the architecture's native size.  So for now I might put it on the back-burner.  (I looked at gluegen briefly but it had trouble parsing something - and the error messages it gives aren't much help).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-5563419803590155763?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/5563419803590155763/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=5563419803590155763' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5563419803590155763'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5563419803590155763'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/08/nvidia-opencl-11.html' title='Nvidia opencl 1.1'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-1956157148623383070</id><published>2011-08-04T11:49:00.003+09:30</published><updated>2011-08-04T12:00:48.846+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><title type='text'>Mailing Lists</title><content type='html'>I just set up some mailing lists for &lt;a href="http://groups.google.com/group/jjmpeg-discuss"&gt;jjmpeg&lt;/a&gt; and &lt;a href="http://groups.google.com/group/socles-discuss"&gt;socles&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I can't tell from google-code if there is much interest in the projects, but it seems a better idea to set up a mailing list than to receive direct emails about them.&lt;br /&gt;&lt;br /&gt;These are still slow long-burn projects i'm working on when I feel inspired, and inspiration varies greatly from week to week.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-1956157148623383070?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/1956157148623383070/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=1956157148623383070' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1956157148623383070'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1956157148623383070'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/08/mailing-lists.html' title='Mailing Lists'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-692700692895686847</id><published>2011-08-02T17:49:00.006+09:30</published><updated>2011-08-02T18:34:36.569+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='rants'/><title type='text'>Bullies, liars, and arseholes.</title><content type='html'>Hmm, so my sister in law just got sacked from one of her cleaning jobs.  After quite a bit of bullying from a fellow employee and what can only be considered racism/discrimination from upper management (e.g. complaining about her diminutive stature in her first week) they finally found enough of an excuse to fire her.  A sham 'explain yourself' meeting that went on for hours, followed by a letter saying that she simply lied about everything in the meeting (which is simply not true).&lt;br /&gt;&lt;br /&gt;Filthy liars.&lt;br /&gt;&lt;br /&gt;I know hardly anyone reads my blog and even fewer locally, but for those, perhaps complain about the lack of cleanliness next time you're in the central market, or maybe just spit on the floor!&lt;br /&gt;&lt;br /&gt;I probably wont bother ever going back there myself - not that I was a regular customer anyway.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-692700692895686847?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/692700692895686847/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=692700692895686847' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/692700692895686847'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/692700692895686847'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/08/bullies-liars-and-arseholes.html' title='Bullies, liars, and arseholes.'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-1446091388650044260</id><published>2011-08-01T14:18:00.002+09:30</published><updated>2011-08-01T14:34:08.478+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Playing with Web Start</title><content type='html'>After a lot of frobbing around I got a simple java webstart demo working for jjmpeg.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://jjmpeg.googlecode.com/svn/webstart/jjmpegdemos.jnlp"&gt;jjmpegdemos.jnlp&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Assuming you have Java Web Start installed this should launch the application - and after a lot of 'this is untrusted' errors should end up with the application running.  It lets you run very simple music player demonstration.  It uses JOAL for the audio output and jjmpeg for the decoding.&lt;br /&gt;&lt;br /&gt;You also need to have the ffmpeg shared libraries installed.  A recent version.  Which probably means this wont work with microsoft platforms yet - although I suppose if the ffmpeg librariesthat are available here: http://ffmpeg.zeranoe.com/builds/ are in the path it might work.&lt;br /&gt;&lt;br /&gt;On GNU/Linux it will depend on compatible libavcodec/etc versions, I'm using using Fedora 13 and 14, and it worked fine on both with ffmpeg-libs from rpmfusion.  I also tested x86 and amd64 platforms.&lt;br /&gt;&lt;br /&gt;Anyway this is really just an experiment - I doubt it will work in general on every platform.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-1446091388650044260?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/1446091388650044260/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=1446091388650044260' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1446091388650044260'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1446091388650044260'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/08/playing-with-web-start.html' title='Playing with Web Start'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-1390980232791687085</id><published>2011-07-28T18:11:00.006+09:30</published><updated>2011-07-29T12:08:12.509+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>jjmpeg - microsoft windows 64</title><content type='html'>Well i've been a bit quiet of late.  I wrote a few blog entries but never got around to publishing them - the mood had changed by the time I finished or I couldn't get my words arranged in a readable manner ...&lt;br /&gt;&lt;br /&gt;I also had a flu for a week, visitors and other distractions, and writers block for the last few weeks - I hitting some problems with work which paralleled some of the problems I was hitting with my hobby code and everything ground to a bit of a halt.  Well such is the way of things.  The flu is mostly gone now and I resolved the deadlock with my work code, so perhaps I will get back to hacking again soon.&lt;br /&gt;&lt;br /&gt;So today I had a couple of spare hours and the motivation to making jjmpeg work on windows - maybe if I have that working i can drop xuggle [for my work stuff] which is getting a bit out of date now.  Actually the main problem is that it's too much hassle to build, and only available in a 32 bit version - and with the opencl code and other issues, the 32 bit jvm limits are starting to cramp the application a bit.&lt;br /&gt;&lt;br /&gt;The biggest problem was working out how to compile it, and after a lot of buggerising around I found it was easiest to just install the mingw 64-bit compiler as a cross compiler and I get to keep the nice coding tools I always use and keep myself in linux.  Trying to do any work at all - and particularly development - in windows is like trying to ride a bike with one leg cut-off and a broken arm.  Unpleasant, and painful.&lt;br /&gt;&lt;br /&gt;Apart from that it was mostly just re-arranging the code to call some simple macros which change depending on the platform - i.e. dlopen/dlsym or LoadLibrary/GetProcAddress.  And then a bit of a rethink on how the binaries are built to support multiple targets via a cross compiler.&lt;br /&gt;&lt;br /&gt;I have done very little testing, but when setup properly it found the library and decoded an mp3 file, which is good enough for me.&lt;br /&gt;&lt;br /&gt;(and obviously, there will never be windows support for the linux-dvb code, only for the libavformat/libavcodec binding).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-1390980232791687085?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/1390980232791687085/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=1390980232791687085' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1390980232791687085'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1390980232791687085'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/07/jjmpeg-windows-64.html' title='jjmpeg - microsoft windows 64'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-6491015600424480205</id><published>2011-07-01T14:46:00.003+09:30</published><updated>2011-07-01T15:10:37.431+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='philosophy'/><title type='text'>OpenCL killer application?</title><content type='html'>So i've been trying to think of some killer application that OpenCL could enable.&lt;br /&gt;&lt;br /&gt;Sure you have video rendering or processing, signal analysis and the like - but for desktop use these sorts of things can already be done.  And if it's a little slow you can always throw more cores and/or boxes at it.&lt;br /&gt;&lt;br /&gt;But I guess the big thing is hand-held devices.  This is probably why the ARM guys are starting to make noise of late: being able to put `desktop power' into hand-held devices.  Still, this is more of an evolutionary change than a revolutionary one - with mobile phones now being pocket computers we all expect that one day they'll be able to do everything we can on bigger machines, with moores law and all that (which is related to the number of transistors, not the processing performance).&lt;br /&gt;&lt;br /&gt;I was also thinking again about AMD's next-gen designs - one aspect I hadn't fully appreciated is that they can scale up as well as down.  Even just a single SM unit with 4x16 SIMD cores running at a modest and battery-friendly clock rate would add a mammoth amount of processing power to a hand-held device.  It has some similar traits to the goals behind the CELL CPU - the design forces you to partition your work into chunks that fit on a single SPU.  But once done you get that done - you gain a massive benefit of then being able to scale up the software by (almost) transparently executing these discrete units of work on more processors if they're available.&lt;br /&gt;&lt;br /&gt;So, I don't think there will be a 'killer application' - software that only becomes possible and popular because of OpenCL (for one, the platform support is going to be weak until the hardware is common, and even then micro$oft wont support it because they're wanker khunts) - rather it will be the hardware application of placing desktop-power into your hand (and if such performance is only utilised to play flash games at high resolution, I fear the future of humanity is already lost).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-6491015600424480205?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/6491015600424480205/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=6491015600424480205' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6491015600424480205'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6491015600424480205'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/07/opencl-killer-application.html' title='OpenCL killer application?'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-9019219344202782152</id><published>2011-07-01T06:38:00.004+09:30</published><updated>2011-07-01T08:00:37.247+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='philosophy'/><title type='text'>It's funny 'cause it's true ...</title><content type='html'>From a little while ago, but I just flipped threw a few weeks worth of xkcd the other day and came across it.&lt;br /&gt;&lt;p align="center"&gt;&lt;a href="http://xkcd.com/844/"&gt;&lt;img src="http://imgs.xkcd.com/comics/good_code.png" border="0"&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;When I was doing engineering at uni we talked about the reams of documentation and being able to pre-define the problem to such a degree that the coding itself would be an afterthought.  A mere bullet-point to be performed by lowly trained knuckle dragging code monkeys somewhere between finalising the design and testing.  Of course, this was proven to be immediately impractical during our final year project - and that was about the last time I ever saw an SDD.  In one job I had we started with lofty goals of fully documenting it using references SRS and SDD's and the like but in the end we just ended up with piles of junk.  They were complete, and even sometimes up to date but ultimately useless - they didn't add any value.&lt;br /&gt;&lt;br /&gt;In reality of course there are many impediments to such an approach:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;The customer doesn't know what they ultimately want.  Ever.&lt;br /&gt;&lt;li&gt;New ideas come along which change or add requirements.&lt;br /&gt;&lt;li&gt;You don't know the best way to solve a problem without trying it.&lt;br /&gt;&lt;li&gt;You don't know where to even start solving problems without plenty of experience.&lt;br /&gt;&lt;li&gt;The market or other outside circumstances force a change.&lt;br /&gt;&lt;li&gt;That just isn't how the brain works - you continue to learn every second of every day and that changes how you would solve problems or present them.&lt;br /&gt;&lt;li&gt;It's slow and too expensive for anyone who has to earn money and not just ask for it (i.e. outside of defence, govt).&lt;/ol&gt;Although contracts are still written this way, and documentation is still a phone-book sized deliverable in military software.  And computer engineering academia are still trying to turn what is essentially an art, into a science.  I don't think their efforts are completely worthless (at least not all of them), but I think software is too complex for this at this stage and only getting more complex.&lt;br /&gt;&lt;br /&gt;It's not that development documentation isn't useful - I wouldn't mind a good SRS myself - but there needs to be a happy medium.&lt;br /&gt;&lt;br /&gt;Back to the flow-chart - which to me has a deeper meta-meaning even by being a flow-chart. The software engineering lecturers scoffed at flow-charts as being obsolete and out of date - yet they seem to be more useful than anything they claimed replaced it.&lt;br /&gt;&lt;br /&gt;Personally I try to do it right but sometimes do it fast - because ultimately you always end up having to refresh a significant chunk of the code-base when the customer reveals what they really wanted from the start.  Fortunately when i'm in the groove (say 30% of the time?) I can hack so fast and well (not to put tickets on myself, but i can) the line is a bit blurred - writing and (re)-re-factoring gobs of code on the fly as the design almost anneals itself into a workable solution.  Pity I can't do that all the time.&lt;br /&gt;&lt;br /&gt;Extra effort is usually is worth it, but not always.  And sometimes the knack is just knowing when you get get away with taking short-cuts.  For isolated code at the tail-end of the call-graph it usually makes little difference so long as it works.&lt;br /&gt;&lt;br /&gt;If you throw the front-end away and start from scratch and you have some well designed code underneath, you can usually re-use most of it.  Crappy code is much harder to re-use.  But in the earlier stages of a project doing it right can be more of a hindrance.  Particularly with OO languages - which force you to create good data models to fit the problem - which means even a small change to the problem can be a big change to the data model.  Of course, many coders never achieve good data models, so perhaps for them the cost isn't so high - at the cost of perpetually low quality code.  Yes I say data, not code - the data is always more important.&lt;br /&gt;&lt;br /&gt;Annealing is probably a good way to describe software design and maturity process - early stages punctuated by large fluid changes due to high-energy experimentation then and over time the changes becoming smaller as it matures and solidifies.  If the requirements change you have to put it back into the fire to liquefy the structure and reconsider how it fits in the new solution.&lt;br /&gt;&lt;br /&gt;Simply bolting on new bits will only create an ugly and brittle solution.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-9019219344202782152?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/9019219344202782152/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=9019219344202782152' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/9019219344202782152'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/9019219344202782152'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/07/its-funny-cause-its-true.html' title='It&apos;s funny &apos;cause it&apos;s true ...'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-2376530420897868468</id><published>2011-06-30T09:40:00.001+09:30</published><updated>2011-06-30T10:33:33.956+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='rants'/><title type='text'>Why not -1 button?</title><content type='html'>So apparently google have added a +1 button to the search results.  Why not a -1?  I'm not sure exactly what this is supposed to achieve - yet more ways for spammers to skew the results?  I'm already a bit wary of google giving me my own private view of the internet, I hardly want that increased.  And often what you're trying to find is the stuff that doesn't come up on the front page now - it seems this would only make that worse.&lt;br /&gt;&lt;br /&gt;Just more clutter I don't need or want which simply slows down the page loading.&lt;br /&gt;&lt;br /&gt;They also seem to have fucked up the mail client by making a section of the screen not scroll.  Which makes it unusably slow and even uglier than it's already ugly forebear.  So I guess I will have to go to basic HTML mode now.  Which is a pity because I use the chat thing to find out when drinks are on.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-2376530420897868468?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/2376530420897868468/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=2376530420897868468' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2376530420897868468'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2376530420897868468'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/06/why-not-1-button.html' title='Why not -1 button?'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-3439432754784088257</id><published>2011-06-26T10:45:00.004+09:30</published><updated>2011-06-26T12:06:32.574+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='biographical'/><category scheme='http://www.blogger.com/atom/ns#' term='house'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><title type='text'>Stuff</title><content type='html'>Been pretty lazy this week - I seemed to spend too much time reading a few sites I frequent from time to time, mostly about the GFC and some of the local political-media clown-show (they are no longer separate entities).  But the picture they paint of the world is pretty bleak so it's really all just a bit of a downer; although i'm not sure if it's just the reading that gets me down or feeling a bit flat in the first place that tends to drive me toward reading it.&lt;br /&gt;&lt;br /&gt;So no spare-time hacking this week.  I did however prune back the golden rain tree in the back yard yesterday - and given we had a couple of days of sunlight I even got a little red in the face.  Always nice to get some sunshine in the middle of winter even when such days are few and far between.  I also made that lime cordial last week.&lt;br /&gt;&lt;br /&gt;For work i'm hitting some big performance problems on the target platform - partly because I think the customer has some unrealistic expectations, and partly because I didn't do enough research at the time on card performance, or they just weren't up to scratch.  Oh well.  I presume it's something to do with the EOFY purchase dash as well but buying new hardware has come up as a possible solution.  Fortunately things have moved a bit since then so at least buying new hardware should be a big help although it wont solve everything.&lt;br /&gt;&lt;br /&gt;I'm also pushing for AMD hardware this time - although the Nvidia hardware has been ok as far as that goes, they've obviously given up on OpenCL (no released 1.1 driver, and their opencl 'zone' hasn't changed in a year) and it doesn't seem like a company that wants my money or deserves any support (even the forums are pretty quiet so it seems i'm not alone - we all get the hint).  Expanding ones experience and educating yourself about the alternatives is always a good thing too.&lt;br /&gt;&lt;br /&gt;By coincidence AMD just had some marketing event about their heterogeneous computing plans, and &lt;a href="http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute"&gt;Anandtech&lt;/a&gt; has a really interesting article on where AMD are going with their GPU/CPU architecture.  Looks quite promising, although i'd really like to see a bump in local-store size.  Although there is certainly enough there to be useful it is still a bottleneck, and with even more parallelism possible due to the design, the limited global bandwidth will only become more of a bottleneck.&lt;br /&gt;&lt;br /&gt;Pity it's still a way off, because a change in architecture of that magnitude will require a different approach for performance, although in general it looks like it will be easier and it will also map well to OpenCL.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-3439432754784088257?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/3439432754784088257/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=3439432754784088257' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/3439432754784088257'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/3439432754784088257'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/06/stuff.html' title='Stuff'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-9059725275332773365</id><published>2011-06-18T15:17:00.004+09:30</published><updated>2011-06-18T15:49:48.872+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='imagez'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Lazerimagez</title><content type='html'>So I needed a bit of a break from work and other stuff I was working on so I've revisited ImageZ in my spare time over the last week or so.  I'm contemplating cleaning it up a little and putting it somewhere other than my own hard drive - it's well and truly alpha-quality but maybe it can go beyond that.&lt;br /&gt;&lt;br /&gt;I decided trying to have a global tool context was just never going to work, so I've moved them to being per-image.  And now I'm in the midst of redoing the drawing tool - recognising that the only thing that changes with the pen types is what it draws at each paint position.  So i've made an interface that creates pens on the fly using shape, dimensions, colour, and fill type.  Still not sure where i'll end up with this but it seems a more promising direction.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-9ib7RtzgDMA/Tfw84VuGKUI/AAAAAAAAATc/0EzmLWn8vcI/s1600/imagez-k.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 145px; height: 400px;" src="http://2.bp.blogspot.com/-9ib7RtzgDMA/Tfw84VuGKUI/AAAAAAAAATc/0EzmLWn8vcI/s400/imagez-k.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5619433373652363586" /&gt;&lt;/a&gt;&lt;br /&gt;And as also can be seen, I thought it needed a bit more colour ...&lt;br /&gt;&lt;br /&gt;I'd forgotten how much I actually had working quite well - layers of different data types (8 bit, 16 bit, float, grey-scale or RGB), infinite undo, compound selections, and a nice save dialogue.  Although there are also some pretty broken parts - all blend modes apart from Normal (I changed the alpha model and haven't updated them), image operations, and being able to load and save multi-layered images (that's a bit of a show-stopper).  I know one reason I didn't get far on the latter was that there is no native support for loading/saving float format images.&lt;br /&gt;&lt;br /&gt;Unfortunately the rest of the household is all croaking from some illness and I feel like i'm headed toward a bit of man-flu, so this weekend will be a slow one I think.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-9059725275332773365?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/9059725275332773365/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=9059725275332773365' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/9059725275332773365'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/9059725275332773365'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/06/lazerimagez.html' title='Lazerimagez'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-9ib7RtzgDMA/Tfw84VuGKUI/AAAAAAAAATc/0EzmLWn8vcI/s72-c/imagez-k.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-5534927226627987391</id><published>2011-06-13T23:27:00.004+09:30</published><updated>2011-06-13T23:58:13.754+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>The Player</title><content type='html'>Not really worth a screen-shot but I kept poking away at the player code and added some seek support.  I've got it working pretty well on 'well-behaved' files, although it isn't nearly robust enough for 'general use'.&lt;br /&gt;&lt;br /&gt;I then got totally side-tracked and started working on a Linux DVB interface.  I just cut &amp; pasted some of the jjmpeg binding code, and created a new shared library and namespace within the jjmpeg project.  The task of binding the ioctl interface doesn't really match that of binding libavcodec but using the same mechanism should suffice.  Not sure jjdvb is a great namespace either, but then again it will do.  Now I just have to drag some coax across the room to the PC to test it, or try it on another machine ...&lt;br /&gt;&lt;br /&gt;As an aside, with all this hacking of late I haven't been doing much else, although at least I got out and mowed the lawn today for the first time in 3 weeks.  And fortunately caught some of the very brief sunlight.  Also swept the leaves up and turned the compost heap over.  But I didn't get to making the lime cordial I had planned to, although I prepared the bottles to keep it in.&lt;br /&gt;&lt;br /&gt;(Today was a public holiday, and with Friday off too that was one long weekend of hacking - I think I need a day off!).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-5534927226627987391?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/5534927226627987391/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=5534927226627987391' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5534927226627987391'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5534927226627987391'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/06/player.html' title='The Player'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-160239764593639816</id><published>2011-06-13T11:55:00.002+09:30</published><updated>2011-06-13T12:13:36.580+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>The Wall</title><content type='html'>Just for fun I tried throwing a bunch of videos into one window simultaneously.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-TkB46TV8SbY/TfV1IKqvM3I/AAAAAAAAATU/DXr_uSBNafc/s1600/videoz-1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 305px;" src="http://3.bp.blogspot.com/-TkB46TV8SbY/TfV1IKqvM3I/AAAAAAAAATU/DXr_uSBNafc/s400/videoz-1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5617524893377573746" /&gt;&lt;/a&gt;&lt;br /&gt;And yes, the sound plays ... although it is quite disconcerting and disorienting having all 9 play at once ...&lt;br /&gt;&lt;br /&gt;The sources are PAL-format digital recordings transcoded to x264, i'm scaling them using libswscale and then simply using swing labels to display them.  It's clocking up between 100% and 150% CPU usage on the java process (100% == 1 core on Linux).  So even though i'm hardly trying to make it quick it's hardly taxing the box (albeit a very fast box).&lt;br /&gt;&lt;br /&gt;For some reason the timing code gets messed up if I try to sync to the audio timestamp, but that might just be these videos.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-160239764593639816?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/160239764593639816/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=160239764593639816' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/160239764593639816'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/160239764593639816'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/06/wall.html' title='The Wall'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-TkB46TV8SbY/TfV1IKqvM3I/AAAAAAAAATU/DXr_uSBNafc/s72-c/videoz-1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-7386019371724312978</id><published>2011-06-12T17:53:00.004+09:30</published><updated>2011-06-13T12:10:58.183+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>Video Player</title><content type='html'>So this weekend I wrote a video player using jjmpeg.  Yesterday I had the basic video working, and today I got sound working - using OpenAL (JOAL) for output.  For the most part even using a fairly simple synchronisation mechanism it works fairly well.  I'm letting the hardware run the sound at it's native rate, and synchronising the video to the timestamps in the file (using Thread.sleep() no less!) - not perfect but it's a start.&lt;br /&gt;&lt;br /&gt;I also uploaded a new jjmpegdemo's directory to the jjmpeg project which includes a &lt;a href="http://code.google.com/p/jjmpeg/source/browse/trunk/jjmpegdemos/src/jjmpegdemos/AudioPlayer.java"&gt;simple audio player&lt;/a&gt; that I used to work out OpenAL.&lt;br /&gt;&lt;br /&gt;But the Video player is a bit more complex, I've currently got 1 thread demuxing the input, 1 thread each for video or audio stream decoding, and another thread to synchronise the audio and video.  I use some of the nice classes from java.util.concurrent to handle the packet and frame queues which means each bit of code is pretty simple.  I'm recycling the AVPacket's, the frames and audio sample buffers so once started it has a pretty low GC load.&lt;br /&gt;&lt;br /&gt;I'm having some strange problems though - certain files seem to throw the demuxer right out - I get massive corruption in video and audio and it's getting completely broken data frames.  These files play just fine in the AudioPlayer above, so I presume i'm doing something incorrect with my threading which is corrupting something along the way.  Other files work just fine though so it doesn't seem to be just a simple problem with invalid code - it also affects different containers and codecs inconsistently.  Just the sort of bugs I like ...&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update&lt;/b&gt; It seems I had too many threads.  I've moved calling the codecs from their own thread to the demux thread.  Then I have 2 threads for rendering the audio and video separately instead.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-7386019371724312978?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/7386019371724312978/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=7386019371724312978' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7386019371724312978'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7386019371724312978'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/06/video-player.html' title='Video Player'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-2773413012973718066</id><published>2011-06-10T21:02:00.005+09:30</published><updated>2011-06-10T21:33:43.496+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='videoz'/><title type='text'>Video List</title><content type='html'>So one thing i've been mucking about with using jjmpeg for is creating a GUI for listing videos ... which seems a pretty basic starting point for doing anything further.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-g3on77ccFC8/TfIBj6IzwYI/AAAAAAAAATM/UmZAK-DDJBI/s1600/videoz-0.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 319px;" src="http://3.bp.blogspot.com/-g3on77ccFC8/TfIBj6IzwYI/AAAAAAAAATM/UmZAK-DDJBI/s400/videoz-0.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5616553401697681794" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Yeah it's not much to look at so far but one has to start somewhere.  I might look at using Piccolo2d as the rendering surface, although I have to determine how to handle virtual items as I do here with the JList.  Apart from general fugliness it flashes white whenever you change the view sort, which is quite unpleasant.  As can be seen, I hooked it up to mplayer after you double-click a row, just for a laff ...&lt;br /&gt;&lt;br /&gt;Under the bonnet it uses jjmpeg of course to scan the files - it's currently generating 128x128 preview images at 1 minute intervals - of which only the first is shown.  I have a separate tool to `import' the videos for the moment but I have code lying about to allow dropping of files, so it wont be hard to add.  I'm using Berkeley DB - java edition to hold the meta-data and preview images, and i've hooked it up so the DB is scanned in another thread.  I use different secondary indices for each sorted view so they are all just as fast (slow?) as each other - this will also let me query by keyword with a little more code.  I'm also using SoftReferences to implement a cache of database items.  Unfortunately Berkeley DB JE doesn't let you query by record number like the C version, nor read the secondary database keys without also dereferencing to the primary database (i.e. slower than might otherwise be), but judicious use of threads can help alleviate such issues.&lt;br /&gt;&lt;br /&gt;In short: it should scale quite well.&lt;br /&gt;&lt;br /&gt;Probably ...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-2773413012973718066?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/2773413012973718066/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=2773413012973718066' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2773413012973718066'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2773413012973718066'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/06/video-list.html' title='Video List'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-g3on77ccFC8/TfIBj6IzwYI/AAAAAAAAATM/UmZAK-DDJBI/s72-c/videoz-0.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-6080438413588169614</id><published>2011-06-10T01:33:00.003+09:30</published><updated>2011-06-10T02:11:26.935+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>beep</title><content type='html'>Feeling a bit cold and crappy and had a day of franticly little progress at work so I sat down with jjmpeg for a little while tonight to pick some low hanging fruit to make it feel like I'm getting somewhere.&lt;br /&gt;&lt;br /&gt;To that end I added audio decoding support.  It only requires a couple of functions and field accessors beyond what you need for video, but they needed some extra native functions to make them work.  Mainly with &lt;code&gt;decode_audio3&lt;/code&gt; as it takes an AVPacket but may not fully consume it and doesn't update it to indicate this, so you really need to make a copy of it and update the copy's pointers based on the decoding results.  At least I did it in a way which requires no allocation activity on the Java side during the decoding loop, I just use the wrapping ByteBuffer's to perform a memcpy and have a single native function to update the data pointer and size on the copy.&lt;br /&gt;&lt;br /&gt;I don't have any immediate need for it, but it was easy enough to add and now the day doesn't feel like a total waste.&lt;br /&gt;&lt;br /&gt;I'm slowly working toward a loose idea of 'something' that uses this stuff, although I haven't really pinned down what that 'something' might be exactly.  Which leaves the options pretty wide open for now.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-6080438413588169614?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/6080438413588169614/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=6080438413588169614' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6080438413588169614'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6080438413588169614'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/06/beep.html' title='beep'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-4895477419373167565</id><published>2011-06-08T12:19:00.006+09:30</published><updated>2011-06-09T12:46:17.659+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><title type='text'>Sharing GLContexts</title><content type='html'>Hit a problem yesterday, how to share GL contexts with OpenCL and several GLCanvas objects.  i.e. I have a window with a few output windows and they all need to share OpenCL processing.&lt;br /&gt;&lt;br /&gt;I eventually found one of the test cases which did the same thing, and I used a similar approach.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Create a 1x1 off-screen GLPbuffer object using GLDrawableFactory.&lt;br /&gt;&lt;li&gt;Add an event listener to that.&lt;br /&gt;&lt;li&gt;Call display() which will invoke the GLEventListener.init() method synchronously - in which I then create my CLGLContext object.&lt;br /&gt;&lt;li&gt;A few other bits and pieces like using makeCurrent() on the GLContext taken from the CLGLContext() when creating textures to be shared in the OpenCL code.&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;Bit tricky to debug as most mistakes just lead to clue-free segfaults.  It didn't help that I forgot to initialise a width/height variable at one point, and until I did everything 'worked' except that creating the output texture always failed with an error which didn't make sense.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-4895477419373167565?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/4895477419373167565/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=4895477419373167565' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4895477419373167565'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4895477419373167565'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/06/sharing-glcontexts.html' title='Sharing GLContexts'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-7870043167970896706</id><published>2011-06-06T12:11:00.003+09:30</published><updated>2011-06-06T13:17:28.765+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>jjmpeg - why java?</title><content type='html'>As a follow-up to the post about using Java and JNI to access ffmpeg, perhaps the more fundamental query is - why use Java in the first place?  After all, Java is slow and crappy and nobody uses it anyway and isn't .NET the way to go and all that?&lt;br /&gt;&lt;br /&gt;I used to write Amiga BOOPSI classes in assembler for fuck's sake, so why am I now using Java?&lt;br /&gt;&lt;br /&gt;For starters Java is not slow - although as with any language you can (un)intentionally make it slower than necessary.  Compared to similar systems with the same application support it isn't bulky; at run-time or on disk.  The JVM is mature and stable and the garbage collection is reliable and fast.&lt;br /&gt;&lt;br /&gt;Machines are also not slow these days - in-fact they are so fast most of the processing power is wasted much of the time.  Likewise for memory.  Wasted processing cycles and wasted memory bytes are actually an inefficiency, not necessarily something to chime about.  I am no longer developing applications for a 1MB system running a multitasking GUI.  Nobody is.&lt;br /&gt;&lt;br /&gt;I still enjoy writing C, and I am still concerned with performance and efficiency, but I have been using Java for a few years now and am very happy with it - and I continue to be further pleasantly surprised from time to time.  I find it puzzling that far more desktop software isn't written using Java - in my experience it compares well in all the important categories and is generally easier to develop for.&lt;br /&gt;&lt;br /&gt;For example, performance is usually within a few % of C for normal scalar, single-threaded C.  Most programmers don't seem capable of going beyond that type of code anyway - and those that are will find JNI a piece of piss.  It will probably require twice as much memory - but this is simply an artefact of the use of a decently fast garbage collector - nothing comes completely free, but with memory expanding so much in recent years this is about the cheapest cost you could imagine for the huge benefit it provides.  And I don't just mean no longer needing to track which pointers to free - I never found that particularly onerous although many people are unable to grok it - the GC is also a very fast memory allocator as well.  No need for pool or slice allocators and the whatnot.&lt;br /&gt;&lt;br /&gt;By the time you add all of the features of a basic JVM runtime to C (or anything else), you have something like GNOME or KDE which are not very small at all, have large memory footprints themselves, and are still not as easy to work with (speaking of GNOME as of some time ago at least, I haven't tried KDE and in any event loathe C++ so am not about to).&lt;br /&gt;&lt;br /&gt;Of course, python (or ruby) seem to be the flavour of the month at the moment, but they have their own issues.  Usually they are just ugly front-ends to some C libraries or commands and they have the same problems that tcl/tk scripts had - a specific version dependency, ugly gui's, and meaningless error messages from their inevitable crashes.  And for all that they're not particularly robust, nor provide a particularly compact memory footprint.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-7870043167970896706?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/7870043167970896706/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=7870043167970896706' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7870043167970896706'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7870043167970896706'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/06/jjmpeg-why-java.html' title='jjmpeg - why java?'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-7055776170832731197</id><published>2011-06-06T11:40:00.003+09:30</published><updated>2011-06-06T11:56:03.043+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>seek to frame</title><content type='html'>Seeking to a frame using ffmpeg ...&lt;br /&gt;&lt;br /&gt;I knew this was a bit of a pain since I'd tried it before, but oh boy - there went my weekend.  And i still don't have a 100% reliable solution.  Ho hum.&lt;br /&gt;&lt;br /&gt;Some of the issues I found with only a handful of videos I have at hand:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;An mpeg ts which will wont seek via timestamp.  Only byte seeking works.&lt;br /&gt;&lt;li&gt;An avi in which byte seeking never works.  Only timestamp seeking does.&lt;br /&gt;&lt;li&gt;An avi in which the DTS increments forever - so although you can seek by timestamp to a keyframe, you cannot use it to identify specific frames thereafter.&lt;br /&gt;&lt;li&gt;A mov file in which byte seeking never works.&lt;br /&gt;&lt;li&gt;A mov file which ends with an EPIPE error rather than end of file.  It must be closed and re-opened to perform any further operations.&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;I have something which mostly works now, but I suspect it will never be reliable enough.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-7055776170832731197?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/7055776170832731197/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=7055776170832731197' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7055776170832731197'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7055776170832731197'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/06/seek-to-frame.html' title='seek to frame'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-2830527400674294616</id><published>2011-06-06T11:23:00.004+09:30</published><updated>2011-06-06T11:39:56.823+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><title type='text'>jjmpeg - why jni?</title><content type='html'>I started writing a reply to Michael's comment on the last entry but because I tend to ramble, it ended up so long I thought I'd promote it to a post.&lt;br /&gt;&lt;blockquote&gt;have you considered using JNA? ...&lt;br /&gt;... otherwise gluegen if you want to stay on the JNI road.&lt;/blockquote&gt;I looked at JNA previously some time ago, and found some problems with using it.  I can't remember what they were at the moment but I was so displeased with it I know it ruled me out ever bothering with it again.  It looks really good on paper but as I chose to write JNI directly at the time (for cross platform code too) there must have been a good reason.  FWIW I didn't look into SWIG or any other option either.&lt;br /&gt;&lt;br /&gt;For ffmpeg specifically, you need to access random fields of big structures and it would be impractical (or impossible) to map them using jna - many of the fields are private and the public ones are spread out through the structure.  So i'd be forced to write a library to define accessors anyway, and then the jna objects to call those, so in the end i'd have to write something twice whereas now i don't even need to write it once (just a simple config file entry, assuming i didn't write a generator for jna - but then there would be no reason to use it).&lt;br /&gt;&lt;br /&gt;I tried gluegen because it looks pretty nice and i've had nothing but positive experiences with jocl and jogl, but it's preprocessor and parser just weren't up to the task - the ffmpeg headers are mostly internal headers which have become public, they are not a cleaned-up public/standard api.  They contain a ton of cruft that isn't public as well as public stuff that is behind conditional compilation (using expressions) and the like.  I tried pre-processing it using cpp -dD (iirc) which preserves the #defines but then the inline code or other stuff threw it and i couldn't even work out which bit of code was the problem from the terse error messages.  After giving up on it i found some other tools that might generate a simple/clean enough file to process (e.g. cproto can dump cleaned up types as well as clean prototypes) - but by then i'd moved on.&lt;br /&gt;&lt;br /&gt;The perl script is a bit of a mess but most of the binding is automatic.  At minimum i only have to write a constructor method for the public class.  Accessors and most methods are automatic (once defined in the config file).  I only need 2 classes for every wrapped 'object', one auto-generated.&lt;br /&gt;&lt;br /&gt;There are a few special cases, but I find JNI pretty easy to use for those - given what it does it's about as simple as could be expected.  And having attempted or worked with interfaces for similar purposes in the past I think JNI is actually quite nice.  For example .NET's native binding looks really nice on paper too (it's more like a 'built-in' JNA) but there are actually more gotchas because it's trying to automate more - it's good most of the time but can be a real pain when things get complicated.&lt;br /&gt;&lt;br /&gt;I'm already spending more time trying to work out how to use the libraries, the binding itself is mostly looking after itself, even if it is still incomplete.&lt;br /&gt;&lt;br /&gt;Having said all that .... I realise that I may have made a mistake and there will be outstanding issues yet to resolve.  But at least i'm fighting with my own mistakes and not finding the hidden limitations of tools i know little about - which simply makes it a lot more fun for a spare-time project.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-2830527400674294616?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/2830527400674294616/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=2830527400674294616' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2830527400674294616'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2830527400674294616'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/06/jjmpeg-why-jni.html' title='jjmpeg - why jni?'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-574285275228047999</id><published>2011-06-03T17:25:00.004+09:30</published><updated>2011-06-03T17:44:20.749+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>jjmpeg rethink</title><content type='html'>So I had a bit of a rethink about how jjmpeg does things and in short rewrote it from scratch over the last couple of days/nights:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;OBAWO (one big-arse-write-once) Perl script generates bindings for field accessors as well as many methods based on a configuration file.&lt;br /&gt;&lt;li&gt;Use dlopen() to bind to libavformat, libavcodec, libswscale at run-time, in order to avoid linking to an impossibly specific version of libavcodec at compile-time.&lt;br /&gt;&lt;li&gt;Write accessors in C.  This will necessarily be a bit slower, but it avoids having to have different Java classes for each case.  It also means only the 'c' files need to be recompiled for a different platform.&lt;br /&gt;&lt;li&gt;The C member functions do their own 'this' lookups, thus allowing them to be called directly as public interfaces (although right now they're not for various reasons).&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;I've now enough of the libraries bound to allow &lt;a href="http://code.google.com/p/jjmpeg/source/browse/trunk/jjmpeg/src/jjmpeg/VideoWriterExample.java&lt;br /&gt;"&gt;creating of video files&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-1HtODDvIaXM/TeiVfl5ggZI/AAAAAAAAATE/lkH1vbFxPMM/s1600/jjmpeg-1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 362px; height: 314px;" src="http://1.bp.blogspot.com/-1HtODDvIaXM/TeiVfl5ggZI/AAAAAAAAATE/lkH1vbFxPMM/s400/jjmpeg-1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5613901305499124114" /&gt;&lt;/a&gt;&lt;p align="center"&gt;It scrolls! (not shown)&lt;/p&gt;&lt;br /&gt;So with this and with a bit of Java2D it's pretty easy to start compositing and generating simple video sequences, if one should so desire.&lt;br /&gt;&lt;br /&gt;I'm pretty sure I have the lifecycle and memory management sorted, although there may still be bugs there.  The AVFrame to AVPlane interface is a bit crappy though.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-574285275228047999?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/574285275228047999/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=574285275228047999' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/574285275228047999'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/574285275228047999'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/06/jjmpeg-rethink.html' title='jjmpeg rethink'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-1HtODDvIaXM/TeiVfl5ggZI/AAAAAAAAATE/lkH1vbFxPMM/s72-c/jjmpeg-1.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-2454511459418791236</id><published>2011-06-01T00:14:00.003+09:30</published><updated>2011-06-01T00:43:33.275+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>VideoFileChooser</title><content type='html'>Just a simple little utility class i'm working on at the moment.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-AyY_L88ox70/TeUErADacAI/AAAAAAAAAS4/18RRknFtLN0/s1600/jjmpeg-0.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 298px;" src="http://2.bp.blogspot.com/-AyY_L88ox70/TeUErADacAI/AAAAAAAAAS4/18RRknFtLN0/s400/jjmpeg-0.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5612897647382392834" /&gt;&lt;/a&gt;&lt;br /&gt;A basic file chooser with a greyscale video preview of selected-format video files.  Seems like it could be nice to have; even if it is only black and white, mute and with (possibly) incorrect aspect ratio and frame-rate.  I just hacked up the image-preview file requester I made for ImageZ and in-fact it took fewer lines of code.&lt;br /&gt;&lt;br /&gt;I also had a go at supporting 32 bit systems with jjmpeg, although I haven't tested it at all yet.  However I noticed that the .so file links with a huge pile of stuff from ffmpeg depending on the build options it was created with.  So it probably will not be possible to make a generic package for it - fortunately the native library is only a single small c file so it probably doesn't need one.  Assuming I don't go and dlopen the libraries manually at least ... which is always an option I suppose.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-2454511459418791236?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/2454511459418791236/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=2454511459418791236' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2454511459418791236'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2454511459418791236'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/06/videofilechooser.html' title='VideoFileChooser'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-AyY_L88ox70/TeUErADacAI/AAAAAAAAAS4/18RRknFtLN0/s72-c/jjmpeg-0.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-6457048891931425819</id><published>2011-05-30T21:50:00.004+09:30</published><updated>2011-05-30T23:51:15.485+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='jjmpeg'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><title type='text'>jjmpeg</title><content type='html'>I've started another new project on google code: &lt;a href="http://code.google.com/p/jjmpeg/"&gt;jjmpeg&lt;/a&gt;.  It's a simple binding to ffmpeg for java, where possible mirroring the API directly.  e.g. compare &lt;a href="http://code.google.com/p/jjmpeg/source/browse/trunk/jjmpeg/src/jjmpeg/Main.java"&gt;Main.java&lt;/a&gt; with &lt;a href="http://web.me.com/dhoerl/Home/Tech_Blog/Entries/2009/1/22_Revised_avcodec_sample.c_files/avcodec_sample.0.5.0.c"&gt;avcodec_sample.0.5.0.c&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;It uses NIO direct ByteBuffers to allow some of the binding to be done in managed code, and the rest is done with custom callbacks.&lt;br /&gt;&lt;br /&gt;I only need it for reading video, so that's all i've implemented; and even then only a tiny subset thereof.  I don't know if it will get any further than that.&lt;br /&gt;&lt;br /&gt;I did look into using gluegen to do the binding but I couldn't get it to work - for starters the ffmpeg headers are too complex for it's limited parser.  Even stripping out the conditionals and the inline code didn't help (using cpp -dD and some editing).  So I just manually select which fields to bind and I have some C which works out the offsets (and auto-generates some classes) from some embedded tables (why write a config parser when the c compiler has one?), and hand-code the function calls.&lt;br /&gt;&lt;br /&gt;I use 3 classes per 'class' to allow me to get away with binding generated code with hand-written code, as well as supporting 32 or 64 bit (perhaps, eventually - only 64 bit done so far).  Base abstract class is auto-generated and sets up the interface for the auto-generated field accessors.  The middle abstract class is the public api and includes the hand-written bindings and factory methods (where appropriate).  The final concrete class implements the accessors for each processor size.&lt;br /&gt;&lt;br /&gt;Up until now i've been using &lt;a href="http://www.xuggle.com/xuggler/"&gt;Xuggler&lt;/a&gt; ... which I probably will keep using for some software, particularly cross platform stuff.  But I don't really like the callback api it uses and for the most part the ffmpeg library api's are fairly simple on their own (despite a bit of bitrot).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-6457048891931425819?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/6457048891931425819/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=6457048891931425819' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6457048891931425819'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/6457048891931425819'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/05/jjmpeg.html' title='jjmpeg'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-5785823917839015727</id><published>2011-05-27T17:26:00.004+09:30</published><updated>2011-05-27T19:15:59.365+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Viola &amp; Jones Detector OpenCL</title><content type='html'>After being a bit distracted yesterday evening by my co-dwellers, I got stuck into a problem I've been wanting to look at for a while - running a Viola &amp; Jones cascade detector on the GPU using OpenCL.  I'd just got an integral image calculator done so was eager to use it.&lt;br /&gt;&lt;br /&gt;I've had a few goes in the past but always seemed to mess up some of the weighting calculations, so I started with the code from &lt;a href="http://opencloovision.codeplex.com/"&gt;OpenClooVision&lt;/a&gt; which is a bit easier to follow than the OpenCV implementation, although could certainly use some work.&lt;br /&gt;&lt;br /&gt;So, by about 6am I had a working implementation ... but it was really very slow.  Far too slow to even consider for what I need, and worse, it doesn't scale at all well - running it on smaller problem sizes just makes it even less efficient.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-Sev1o3vHHqA/Td9mN4fwpnI/AAAAAAAAASo/Lz9ni3S16G8/s1600/viola-jones-0.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 311px;" src="http://1.bp.blogspot.com/-Sev1o3vHHqA/Td9mN4fwpnI/AAAAAAAAASo/Lz9ni3S16G8/s400/viola-jones-0.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5611316049416857202" /&gt;&lt;/a&gt;&lt;p align=center&gt;&lt;em&gt;Who is that fat bastard?&lt;/em&gt;&lt;/p&gt;&lt;br /&gt;I went to bed shivering cold but mostly wondering just what I could do to speed it up.  I have previously done some work with integral images, and I found they do not work particularly well on GPUs - even calculating them aside (which I managed to solve with an &lt;a href="http://code.google.com/p/socles/source/browse/trunk/socles/src/au/notzed/socle/image/IntegralImage.java"&gt;acceptable solution&lt;/a&gt; although it took many many dead-ends and grey hairs).  Although on paper they look like an efficient solution - a handful of array lookups to calculate an area sum - in use they seem to interact poorly with texture cache.&lt;br /&gt;&lt;br /&gt;It was taking in excess of 30 000uS to perform 14 passes on a 640x480 test image in steps of 5 with a scale of 1.25.&lt;br /&gt;&lt;br /&gt;The OpenCV and faint implementations both pre-scale the features and feature-weights.  I never quite understood why until I had a working implementation, and the OpenClooVision version was calculating them on the fly.  So the first stop was to try this.  Pre-calculating the weights is extremely cheap, and this lead to around a 50% performance boost.&lt;br /&gt;&lt;br /&gt;I still had a problem with the GPU hardly being utilised, particularly at the larger scales (fewer tasks/call) or with smaller images.  And because each thread was working on a separate probe, there was very little coherency in processing or data.&lt;br /&gt;&lt;br /&gt;However, I noticed that for the cascade I was using (the default one from OpenCV) it was running many feature tests for each stage - 25-200; and that calculating the feature value was unconditional - something ripe for parallelisation.&lt;br /&gt;&lt;br /&gt;So I tried launching 64 threads for each probe location, and they work together on the list of features in blocks of 64, and then tally them up at the end using thread 0.  This was the biggest improvement and I managed to get it down to around 12 000uS.&lt;br /&gt;&lt;br /&gt;I then tried a parallel prefix sum - which got it down to about 11 000uS, although then I tried a sqrt(N) sum (split the summation into 2 passes, first 16 threads produce 16 partial sums, then 1 thread adds those up) and got it down to about 10 500uS.  Parallel prefix sum loses again.&lt;br /&gt;&lt;br /&gt;And then to cut a long story short I tried a whole bunch of variations, such as  storing the regions in integer format - this made it faster, but only because it wasn't calculating the same results.  And different work-sizes - 64 worked the best.  And different packing formats for the feature descriptors.  But no matter what I tried, about 10 500 uS was about the best I could manage with the test image.&lt;br /&gt;&lt;br /&gt;I also tried a slightly modified version (no thread dependencies) running on the AMD CPU driver, on the 6 core+HT machine I have.  That managed 90mS.  So the GPU is only 10x faster, which although nothing to sneeze at is still a bit disappointing.&lt;br /&gt;&lt;br /&gt;To me this is still a little on the slow side, but I'm pretty much out of ideas for now.  It might just be a problem that defies any particularly efficient mapping to GPU hardware.&lt;br /&gt;&lt;br /&gt;Actually one thing I haven't tried is scaling the images instead of the features ... but I think that's something that can wait for another day.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-5785823917839015727?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/5785823917839015727/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=5785823917839015727' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5785823917839015727'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5785823917839015727'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/05/viola-jones-detector-opencl.html' title='Viola &amp; Jones Detector OpenCL'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-Sev1o3vHHqA/Td9mN4fwpnI/AAAAAAAAASo/Lz9ni3S16G8/s72-c/viola-jones-0.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-7182670275005744449</id><published>2011-05-24T01:14:00.006+09:30</published><updated>2011-05-26T00:05:28.672+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>OpenCL Kanade Lucas Tomasi feature tracking</title><content type='html'>I've added a couple of things to &lt;a href="http://code.google.com/p/socles"&gt;socles&lt;/a&gt;, the main being an implementation of Kanade Lucas Tomasi feature tracking.  It's just a fairly simple translation of the &lt;a href="http://www.ces.clemson.edu/~stb/klt/"&gt;public domain version here&lt;/a&gt; (site seems down at time of writing), with a few tweaks for good GPU performance.  Ok it looks nothing like it, but all i did was remove the need for 3 temporary images and the loops to create and use them by noting that each element was independent of the other and so they could all be moved inside of a single loop.&lt;br /&gt;&lt;br /&gt;I've only implemented the translation-only version of the tracker, with the lighting-sensitive matching algorithm.  The affine version shouldn't be terribly difficult - it's mostly just re-arranging the code in a similar fashion (although the 6x6 matrix solver might benefit from more thorough work).&lt;br /&gt;&lt;br /&gt;Really awful screenshot of it in action ... (it's very late again)&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-lzZgGqSAxhk/TdqBILHS0VI/AAAAAAAAASg/nchHbdJDJbM/s1600/socles-1.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 327px;" src="http://3.bp.blogspot.com/-lzZgGqSAxhk/TdqBILHS0VI/AAAAAAAAASg/nchHbdJDJbM/s400/socles-1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5609938263265694034" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The 3300 regularly spaced feature points above take about 1.5ms to track in GPU time (480 GTX), including all of the image pyramid and gradient image set-up for the incoming frame.  It uses 3 levels of pyramid with each 1/4 the size in each dimension of the one below.  Most of the points lose track because there isn't enough detail to meet the quality settings, hence the areas with no green dots.&lt;br /&gt;&lt;br /&gt;With a CPU implementation the majority of the time is taken in the convolutions necessary to setup the image pyramid - although eventually with enough features one reaches a break-even point.  In this case it's about 500uS for the image setup for 640x480 image (3 convolutions at full res, 2 at each subsampling and one resample), and 1ms for the 3300 tracks (i.e. 0.3 microsecond per additional 16x16 feature tracked).  Well that's quick enough to keep me happy anyway.  At least for now!&lt;br /&gt;&lt;br /&gt;I implemented this using a pattern which I've found after much (much) trial and error to be reliably efficient for mapping many types of rectilinear problems to GPU processes, which i'll just call an NxM execution pattern.  For a feature-size of 'NxM', the code assigns 'N' threads to each feature, each which works on a column of 'M' rows of data, with the first thread in the sub-group used to tally results and make decisions.  A small amount of local memory is used for the partial sums and to share the decisions amongst the threads.  These are mapped to a flat work size which is independent of the local workgroup size, which allows for easy tuning adjustments.  So a feature size of 9 would be mapped using 7 features if the local work-size is 64, with 15 threads idle (i.e. choose factors of the worksize for best efficiency, but it can still work well with any size).  100 such features would need 15 work groups (ceil(100/7)).  The address arithmetic is a bit messy but it only needs to be calculated once on entry, after that it is simple fixed indexing.  And since you can reference any memory/local memory by local id 0 you don't have to worry about memory bank conflicts.&lt;br /&gt;&lt;br /&gt;I tried many other variations of mapping the work to the GPU and this was by far the best (and not far off the simplest once the fiddly address calculations are out of the way).  For example, using a parallel prefix some across all elements with a 3d work size (i.e. 16x16xnfeatures) was about 1/2 the speed IIRC - the increased parallelism was a hindrance because of both the IPC required and the over-utilisation of resources (more local memory and threads).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update&lt;/b&gt; Ok I found a pretty major problem with this mapping of the jobs - early exit of some sub-workgroups messes up the barrier usage which upsets at least the AMD cpu driver.  So all threads need to keep executing until they are all finished and then mask the result appropriately, which will complicate matters but hopefully not by much.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update 2&lt;/b&gt; I think I managed to solve it, although I had some trouble with the code behaving as I expected, and got misled by bugs in other kernels.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-7182670275005744449?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/7182670275005744449/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=7182670275005744449' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7182670275005744449'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7182670275005744449'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/05/opencl-kanade-lucas-tomasi-feature.html' title='OpenCL Kanade Lucas Tomasi feature tracking'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-lzZgGqSAxhk/TdqBILHS0VI/AAAAAAAAASg/nchHbdJDJbM/s72-c/socles-1.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-4765244679974091605</id><published>2011-05-22T09:46:00.003+09:30</published><updated>2011-05-22T11:20:50.575+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>branchless stackless merge sort</title><content type='html'>After a &lt;a href="http://a-hackers-craic.blogspot.com/2011/05/3x3-median-filter-or-branchless.html"&gt;previous post&lt;/a&gt; I had some other thoughts about implementing a branch-less merge.  And further, of removing any stack or recursion as well, at least in certain cases.&lt;br /&gt;&lt;br /&gt;This is more of a mental exercise in this case because I only looked at sorting an array - where you're better off using qsort, but it might be worth it on a linked list.&lt;br /&gt;&lt;br /&gt;First merge version:&lt;pre&gt;static inline void merge1(void * restrict head1,&lt;br /&gt;     int len1,&lt;br /&gt;     void *tmpData,&lt;br /&gt;     CmpFunc cmp,&lt;br /&gt;     size_t itemsize)&lt;br /&gt;{&lt;br /&gt; void * restrict tail1 = head1 + len1*itemsize;&lt;br /&gt; void * restrict head2 = tail1;&lt;br /&gt; void * restrict tail2 = head2 + len1 * itemsize;&lt;br /&gt;&lt;br /&gt; void * restrict h1 = tmpData;&lt;br /&gt; void * restrict t1 = tmpData;&lt;br /&gt; void * restrict tail = head1;&lt;br /&gt;&lt;br /&gt; // copy first array to temporary buffer&lt;br /&gt; t1 = tmpData + (tail1-head1);&lt;br /&gt; memcpy(tmpData, head1, tail1-head1+itemsize);&lt;br /&gt;&lt;br /&gt; // merge temporary buffer into target buffer&lt;br /&gt; while (1) {&lt;br /&gt;  if (cmp(h1, head2) &amp;lt; 0) {&lt;br /&gt;   memcpy(tail, h1, itemsize);&lt;br /&gt;   tail += itemsize;&lt;br /&gt;   if (h1 == t1)&lt;br /&gt;    // end of first list - rest is already in place&lt;br /&gt;    goto done;&lt;br /&gt;   h1 += itemsize;&lt;br /&gt;  } else {&lt;br /&gt;   memcpy(tail, head2, itemsize);&lt;br /&gt;   tail += itemsize;&lt;br /&gt;   if (head2 == tail2) {&lt;br /&gt;    // end of second list - copy rest of first list back&lt;br /&gt;    memcpy(tail, h1, t1-h1+itemsize);&lt;br /&gt;    goto done;&lt;br /&gt;   }&lt;br /&gt;   head2 += itemsize;&lt;br /&gt;  }&lt;br /&gt; }&lt;br /&gt; done:&lt;br /&gt;  return;&lt;br /&gt;}&lt;/pre&gt;Keeping the loop simple is the best policy in this case.&lt;br /&gt;&lt;br /&gt;The branchless merge:&lt;br /&gt;&lt;pre&gt;static inline void merge2(void * restrict head1,&lt;br /&gt;     int len1,&lt;br /&gt;     void * restrict tmp,&lt;br /&gt;     CmpFunc cmp,&lt;br /&gt;     size_t itemsize,&lt;br /&gt;     void *sentinel)&lt;br /&gt;{&lt;br /&gt; void *h1 = tmp;&lt;br /&gt; void *h2 = head1+len1*itemsize;&lt;br /&gt; void *out = head1;&lt;br /&gt; int len = len1*2;&lt;br /&gt; int i;&lt;br /&gt;&lt;br /&gt; // copy first half array to temporary buffer &amp; add sentinal&lt;br /&gt; memcpy(tmp, head1, len1*itemsize);&lt;br /&gt; memcpy(tmp+len1*itemsize, sentinel, itemsize);&lt;br /&gt;&lt;br /&gt; // save spot for sentinel in 2nd array&lt;br /&gt; memcpy(tmp+len1*itemsize+itemsize, h2+len1*itemsize, itemsize);&lt;br /&gt; memcpy(h2+len1*itemsize, sentinel, itemsize);&lt;br /&gt;&lt;br /&gt; // merge&lt;br /&gt; for (i=0;i&amp;lt;len;i++) {&lt;br /&gt;  int c;&lt;br /&gt;&lt;br /&gt;  c = cmp(h1, h2) &amp;lt; 0;  &lt;br /&gt;  memcpy(out, c ? h1 : h2, itemsize);&lt;br /&gt;  h1 = c ? h1 + itemsize : h1;&lt;br /&gt;  h2 = c ? h2 : h2 + itemsize;&lt;br /&gt;  out += itemsize;&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt; // restore&lt;br /&gt; memcpy(h2+len1*itemsize, tmp+len1*itemsize+itemsize, itemsize);&lt;br /&gt;}&lt;/pre&gt;&lt;br /&gt;It uses a sentinel so there needs to be no testing of loop termination beyond the number of total elements being merged.&lt;br /&gt;&lt;br /&gt;And finally the removal of the stack or recursion to find out when merges need to take place.  One simple way (breadth-first) is just to iterate over the whole array, at size 2, then 4, 8, and so on, but this has poor locality of references.  So this simple logic is used to track when and how large merges are to take place and does them in a depth-first fashion.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;void MergeSort(void *data, size_t count, size_t itemsize, CmpFunc cmp, void *sentinal) {&lt;br /&gt; int i, j;&lt;br /&gt; int last = 0;&lt;br /&gt; void *tmp;&lt;br /&gt;&lt;br /&gt; tmp = malloc((count+1) * itemsize);&lt;br /&gt;&lt;br /&gt; for (i=0;i&amp;lt;count;) {&lt;br /&gt;  batchers8(data+i*itemsize, tmp, cmp, itemsize);&lt;br /&gt;  i += 8;&lt;br /&gt;  for (j=16;j&lt;=count;j&lt;&lt;=1) {&lt;br /&gt;   if (j&amp;amp;(i^last)) {&lt;br /&gt;    merge2(data+(i-j)*itemsize, j/2, tmp, cmp, itemsize, sentinal);&lt;br /&gt;   }&lt;br /&gt;  }&lt;br /&gt;  last = i;&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt; free(tmp);&lt;br /&gt;}&lt;/pre&gt;This code also performs the bottom sort in 8 element lots using an unrolled batchers sort.  So one might note that this particular sort is limited to sorting arrays whose number of elements is a power of 2 ...&lt;br /&gt;&lt;br /&gt;I ran a few timings and this is slightly faster than the natural merge sort that I had previously written in certain cases - that of course is still much faster for the already-sorted case.  The version I timed had the merge step manually unrolled 8 times as well.&lt;br /&gt;&lt;br /&gt;For random input, the branchless merge is about 10% faster than the original merge, and the original merge is about 10% faster than my original natural merge sort.  So the lack of conditional execution ends up a bonus, even considering the extra overheads, and all the extra memory moves.  This is on an intel cpu anyway - which is so register starved they've had to make such code run fast, so one might not end up with the same result on other architectures.&lt;br /&gt;&lt;br /&gt;There is more possible here too if you're sorting basic types (i.e. not a general interface as above) - all of these algorithms can be vectorised to run on SIMD processors.  It could for example independently sort 4 interleaved arrays in the same number of operations, and then merge those in 2-3 passes at the end (or 1, with a 4-way merge).&lt;br /&gt;&lt;br /&gt;Sentinels are a pretty nice optimisation trick but they usually require some sort of extra housekeeping that makes their use impractical.  For example in the camel-mime-parser code I used a sentinel of \n at the end of the i/o buffer every time i refreshed it so the lowest level loop which identified individual lines was only 3 instructions long - and it only checked for running out of input after the loop had terminated.  But I had to write my own i/o buffering to have such control of the memory.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-4765244679974091605?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/4765244679974091605/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=4765244679974091605' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4765244679974091605'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4765244679974091605'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/05/branchless-stackless-merge-sort.html' title='branchless stackless merge sort'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-4940104451898223578</id><published>2011-05-21T10:21:00.002+09:30</published><updated>2011-05-21T10:24:37.550+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>putAcquireGLObject()</title><content type='html'>Damnit, i've done it again.  That's what I get for trying to cut and paste some old code that apparently never worked anyway.&lt;br /&gt;&lt;br /&gt;I was trying to implement an OpenGL rendering loop for some OpenCL code rather than copying it back to Java2D every time.  Except I based the code on some test code i'd written which now I realise never worked.&lt;br /&gt;&lt;br /&gt;And mostly it failed because it didn't call putAcquiteGLObject() before writing to the output texture.&lt;br /&gt;&lt;br /&gt;A few more hours of my day i'll never get back ... and I think it's the second time i've done it.  Maybe i'll remember it this time if I write it down.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-4940104451898223578?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/4940104451898223578/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=4940104451898223578' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4940104451898223578'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4940104451898223578'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/05/putacquireglobject.html' title='putAcquireGLObject()'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-1925038090614305065</id><published>2011-05-19T12:51:00.002+09:30</published><updated>2011-05-19T14:21:01.364+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>More sort thoughts</title><content type='html'>Late last night when I should really have already headed to bed I had some more thoughts on the &lt;a hre="http://a-hackers-craic.blogspot.com/2011/05/3x3-median-filter-or-branchless.html"&gt;median filter&lt;/a&gt; and played a bit with a batchers sort generator.&lt;br /&gt;&lt;br /&gt;For a 9 value median, rather than do a full 9 element sort, one could just do an 8 element sort and then attempt to insert the final value into the middle.  The insert which only must be valid for the centre value only takes 2 additional &lt;code&gt;cas&lt;/code&gt; steps.  But an 8 sort takes quite a few fewer steps than a 9 sort, and even then 2 are redundant for finding the median.  In short, this allows a 9 element median to be calculated in only 19 &lt;code&gt;cas&lt;/code&gt; operations rather than 22.&lt;br /&gt;&lt;br /&gt;For larger sorts one can probably repeat the process further.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-1925038090614305065?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/1925038090614305065/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=1925038090614305065' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1925038090614305065'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/1925038090614305065'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/05/more-sort-thoughts.html' title='More sort thoughts'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-5419129344916928678</id><published>2011-05-18T11:58:00.005+09:30</published><updated>2011-05-18T13:44:29.619+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>3x3 median filter, or branchless loopless sort</title><content type='html'>So while thinking of something I could add to &lt;a href="http://code.google.com/p/socles"&gt;socles&lt;/a&gt; to make it worth uploading I thought of implementing a median filter.&lt;br /&gt;&lt;br /&gt;Median filters are a bit of a pain because one has to sort all of the pixel values (for a greyscale image at least) before getting the result - and sorts generally require loops and other conditional code which make things a bit inefficient when executed on a GPU.&lt;br /&gt;&lt;br /&gt;So somehow i'm back to looking into sorts again ... the never-ending quest continues.&lt;br /&gt;&lt;br /&gt;Fortunately, this is another case where a batchers sort or other hand-rolled sorting networks come to the rescue.  And since each element is being processed independently there are no inter-thread dependencies and all the processing can be unrolled and performed on registers (if you have enough of them).&lt;br /&gt;&lt;br /&gt;So I implemented the 9 element sorting network from Knuth Volume 3 section 5.3.4 which sorts 9 elements in 25 compare and swap steps.  But since 3 of the comparisons of the 25 aren't involved in the median value I also dropped them and ended up with 22 inline comparison and swaps to find the median value.  Fortunately this is a small enough problem to fit entirely into registers too.&lt;br /&gt;&lt;br /&gt;This also works pretty well in C - an albeit-inline version - manages to perform 100M median calculations on a 9 integer array (on the same array which is copied over) in 1.26s on my workstation, using the glibc qsort() to do the same task takes 17s.  (I didn't verify that the inline version was transferring data to/from memory as it should in the micro-benchmark, but that does sound about right all things considered).&lt;br /&gt;&lt;br /&gt;So, given unsorted values &lt;code&gt;s0-s8&lt;/code&gt;, and an operator &lt;code&gt;cas(a, b)&lt;/code&gt; (compare and swap) which results in &lt;code&gt;a &amp;lt; b&lt;/code&gt;, the median can be calculated using the following 22 steps:&lt;br /&gt;&lt;pre&gt;        cas(s1, s2);&lt;br /&gt;        cas(s4, s5);&lt;br /&gt;        cas(s7, s8);&lt;br /&gt;&lt;br /&gt;        cas(s0, s1);&lt;br /&gt;        cas(s3, s4);&lt;br /&gt;        cas(s6, s7);&lt;br /&gt;&lt;br /&gt;        cas(s1, s2);&lt;br /&gt;        cas(s4, s5);&lt;br /&gt;        cas(s7, s8);&lt;br /&gt;&lt;br /&gt;        cas(s3, s6);&lt;br /&gt;        cas(s4, s7);&lt;br /&gt;        cas(s5, s8);&lt;br /&gt;        cas(s0, s3);&lt;br /&gt;&lt;br /&gt;        cas(s1, s4);&lt;br /&gt;        cas(s2, s5);&lt;br /&gt;        cas(s3, s6);&lt;br /&gt;&lt;br /&gt;        cas(s4, s7);&lt;br /&gt;        cas(s1, s3);&lt;br /&gt;&lt;br /&gt;        cas(s2, s6);&lt;br /&gt;&lt;br /&gt;        cas(s2, s3);&lt;br /&gt;        cas(s4, s6);&lt;br /&gt;&lt;br /&gt;        cas(s3, s4);&lt;/pre&gt;&lt;br /&gt;&lt;code&gt;s4&lt;/code&gt; now contains the median value.  &lt;code&gt;cas&lt;/code&gt; can be implemented using 1 compare and 2 &lt;code&gt;select&lt;/code&gt;s.&lt;br /&gt;&lt;br /&gt;But perhaps the nicest thing is the code is directly vectorisable.&lt;br /&gt;&lt;br /&gt;Some timings using a 4 channel 32-bit input image 640x480 and producing a 4 channel 32-bit output image:  For 1 element it takes 84uS, for 3 elements 175uS and 4 elements 238uS.  This is on a NVidia 480GTX.&lt;br /&gt;&lt;br /&gt;The current implementation: &lt;a href="http://code.google.com/p/socles/source/browse/trunk/socles/src/au/notzed/socle/filter/median.cl"&gt;median.cl&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-5419129344916928678?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/5419129344916928678/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=5419129344916928678' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5419129344916928678'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5419129344916928678'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/05/3x3-median-filter-or-branchless.html' title='3x3 median filter, or branchless loopless sort'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-7800502900960404783</id><published>2011-05-18T11:54:00.004+09:30</published><updated>2011-05-18T12:45:55.125+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='socles'/><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>socles</title><content type='html'>I finally started a new project to store some of the simpler OpenCL stuff i've been working on.&lt;br /&gt;&lt;br /&gt;I've called it &lt;a href="http://code.google.com/p/socles"&gt;socles&lt;/a&gt; - a socle is another word for plynth - the base of a column.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-gtzh4FlTGlg/TdM5z4_1EGI/AAAAAAAAASY/WAwLrJCsJbs/s1600/socles-0.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 327px;" src="http://2.bp.blogspot.com/-gtzh4FlTGlg/TdM5z4_1EGI/AAAAAAAAASY/WAwLrJCsJbs/s400/socles-0.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5607889524642615394" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;It's a really basic set of operations at the moment, but I hope they are decent implementations thereof at least.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-7800502900960404783?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/7800502900960404783/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=7800502900960404783' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7800502900960404783'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7800502900960404783'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/05/socles.html' title='socles'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-gtzh4FlTGlg/TdM5z4_1EGI/AAAAAAAAASY/WAwLrJCsJbs/s72-c/socles-0.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-2685158140053861235</id><published>2011-05-15T17:02:00.003+09:30</published><updated>2011-05-15T17:55:28.561+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Convolved out</title><content type='html'>Bit of a dry post coming up, but when you consider I blew my Sunday on this it might make sense.  Curiosity got the better of me today and I spent most of it playing with convolutions with OpenCL on a GPU - I had a pretty fast implementation but wanted to compare it with some other ideas I had.&lt;br /&gt;&lt;br /&gt;As it turned out, the implementation I had was the fastest after-all, although I tweaked it a tiny bit during the process.&lt;br /&gt;&lt;br /&gt;For kernels at 3x3 (or 7x7 for 4-channel UBYTE images), a simple 2d implementation is very slightly faster than a more complex algorithm.&lt;br /&gt;&lt;br /&gt;For non-separable convolution, a complex implementation which uses a rolling buffer is over 3x faster than a naive implementation, at least up to sizes of 31x31.&lt;br /&gt;&lt;br /&gt;For separable convolution, my complex implementation is up to 2.5x faster than a naive implementation.&lt;br /&gt;&lt;br /&gt;My separable convolution implementation reads 16x16 blocks of image into local memory and then each thread generates all results from the local memory in one pass - e.g. for up to 7x7 convolution it reads 2x16x16 blocks, for up to 15x15 convolution it reads 3x16x16 blocks, and so on i.e. you need the 16x16 data plus 'kernel radius' pixels each size.  It uses transpose for the Y convolution case during load and saving of the data but the processing is identical.  It also uses the trick of offsetting the odd rows of the data so they avoid local memory contention when they might otherwise - e.g. when the number of bocks being read is even.&lt;br /&gt;&lt;br /&gt;FWIW for 640x480 image on a GTX 480  A single channel FLOAT 31x31 separable convolution is about 190uS, or 470uS for naive version.  For UBYTE 177uS vs 470uS.  For a 4 channel image the timings are 413uS, 916uS, 389uS, and 465uS respectively.  So larger (byte size) images gain more - presumably from the reduction in memory reads and lower cache loading.&lt;br /&gt;&lt;br /&gt;Actually - yesterday I started working on a JOCL based image processing library that I intend to drop on google code at some point - and this investigation was part of that.  More should be forthcoming on that soon although right now I just don't have enough put together to be much use.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-2685158140053861235?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/2685158140053861235/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=2685158140053861235' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2685158140053861235'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/2685158140053861235'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/05/convolved-out.html' title='Convolved out'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-7791274696506770098</id><published>2011-05-06T14:40:00.003+09:30</published><updated>2011-05-06T15:35:25.380+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Luser error</title><content type='html'>So it turns out the main problem with the &lt;a href="http://a-hackers-craic.blogspot.com/2011/05/cache-only-works-when-theres-enough.html"&gt;performance issues&lt;/a&gt; in my last post wasn't the cache being overloaded after all ... but the compiler running out of registers to unroll a couple of nested loops I asked it to.&lt;br /&gt;&lt;br /&gt;It turns out because of a cut and paste job I left in some test code which was limiting the register count for the compiler, and the register spillage was a bit of a disaster and causing the order of magnitude drop.  With this fixed things look a lot more promising, although at ~4ms it is still a little slower than I would like and I can't see much being able to be done to improve it.&lt;br /&gt;&lt;br /&gt;It's a bit weird that an unconstrained compile for a specific device would choose to use all registers it needs for the code-unrolling, ignoring the device specifics, but I guess this is some particular of the nvidia opencl implementation.&lt;br /&gt;&lt;br /&gt;There is still some problems with the texture cache being hit too hard anyway - the integral image has to be stored using the UINT32 data type, and it needs at least 8 lookups per feature tested which is a pretty heavy load to start with.  With the other feature tester I could just use 8 bit images which fits better into the cache it seems (contrary to what i'd believed until this point).&lt;br /&gt;&lt;br /&gt;Investigations ongoing ...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-7791274696506770098?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/7791274696506770098/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=7791274696506770098' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7791274696506770098'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/7791274696506770098'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/05/luser-error.html' title='Luser error'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-9030570314744854121</id><published>2011-05-05T21:37:00.004+09:30</published><updated>2011-05-06T15:36:51.259+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Cache only works when there's enough</title><content type='html'>&lt;b&gt;Update:&lt;/b&gt; Seems I was (mostly) wrong - see &lt;a href="http://a-hackers-craic.blogspot.com/2011/05/luser-error.html"&gt;the follow-up post&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I've been playing with object classifiers in OpenCL - I have one that works, not terribly well but relatively quickly.  It's a random tree classifier and just uses pixel intensity comparisons for feature tests.  Although I can get some results out of it they just weren't reliable enough.&lt;br /&gt;&lt;br /&gt;So I decided to use 2-bit-binary patterns instead, a haar-like feature measure which uses an integral image to accelerate the intensity comparisons (maybe more on the integral image creation another time, that was a bit of a journey in OpenCL as well).  Unfortunately the simple modifications required to change the feature detector suddenly blew out the computation time - from under 1ms to over 20ms, making it far too slow.  This even though it only requires twice the memory accesses for the same number of tests.&lt;br /&gt;&lt;br /&gt;After much experimentation I found the cause - the texture cache was being exhausted, dramatically reducing the apparent memory throughput.  I discovered that a single-tree, 3-feature tester is about the limit of the texture cache.  That will execute in 150uS.  4-features take 250uS, and if I change the 4-feature system to 2 trees, or 8 features - just twice the amount of work - it blows out to 1500uS.&lt;br /&gt;&lt;br /&gt;Well, at least that gives me options for splitting the work into multiple passes and maybe i'll end up with something fast enough over-all.  Otherwise I might have to find something else.  As it is i'm losing confidence that it is going to be good enough anyway (maybe after running it so many times i've forgotten how poor the previous one was).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-9030570314744854121?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/9030570314744854121/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=9030570314744854121' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/9030570314744854121'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/9030570314744854121'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/05/cache-only-works-when-theres-enough.html' title='Cache only works when there&apos;s enough'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-8909230903561202246</id><published>2011-05-01T17:07:00.008+09:30</published><updated>2011-11-03T17:46:52.789+10:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='cooking'/><title type='text'>Hot Sauce #2 - Sweet Arson</title><content type='html'>&lt;p&gt;As it turns out it was a rainy day and I thought the fresh habaneros either needed to be used or frozen so I made another sauce today as well.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-snjt8hHwAVo/Tb0O43V3BfI/AAAAAAAAASQ/0Gme7dA7I-c/s1600/potato-0.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://2.bp.blogspot.com/-snjt8hHwAVo/Tb0O43V3BfI/AAAAAAAAASQ/0Gme7dA7I-c/s400/potato-0.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5601649881609864690" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;p&gt;The week previously I had excavated a patch of sweet potatoes and ended up with a reasonable haul (the largest in the photo is 2.4kg) ... and I had the idea of using the sweet potato as the base for this sauce.  And this time I thought i'd try a warming mix of herbs to compliment it.  I'm really quite pleased with this one - it has quite an interesting and pleasant flavour and will leave your lips burning from even quite a small amount.&lt;br /&gt;&lt;h3&gt;Ingredients&lt;/h3&gt;&lt;table cellspacing="4"&gt;&lt;tr&gt;&lt;td align="right"&gt;450g&lt;td&gt;Roasted sweet potato (2-3 average tubers).&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;12&lt;td&gt;Cayesan Chillies, roasted.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;12&lt;td&gt;Habaneros, roasted.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;150ml&lt;td&gt;Grapeseed oil.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1&lt;td&gt;Diced onion.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;&lt;td&gt;&lt;em&gt;Dry Spices&lt;/em&gt;&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;2 tsp&lt;td&gt;Coriander seed.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1 tsp&lt;td&gt;Cardomon seed.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1 tsp&lt;td&gt;Black peppercorns.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1 tsp&lt;td&gt;Whole allspice.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;2&lt;td&gt;Cloves.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1 tsp&lt;td&gt;Fennel seed.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;&lt;td&gt;&lt;em&gt;Liquids&lt;/em&gt;&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;2 cups&lt;td&gt;Malt vinegar.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1 1/2 cup&lt;td&gt;Water.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1 tsp&lt;td&gt;Citric Acid.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1 tsp&lt;td&gt;Salt&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1 tsp&lt;td&gt;MSG&lt;/table&gt;&lt;br /&gt;&lt;h3&gt;Method&lt;/h3&gt;&lt;p&gt;Place the sweet potato whole and un-peeled into a roasting tray and roast until done - about an hour.  Place the chillies on another tray and roast until they start to collapse - about 20 minutes.  Remove the potatoes from the oven and let them cool to a suitable temperature for handling and peel the skins off (perhaps sweating them under foil will help).&lt;br /&gt;&lt;p&gt;Meanwhile, place the oil and onion in a medium sized saucepan and cook slowly on low heat - so they become translucent but do not brown.&lt;br /&gt;&lt;p&gt;Take the dry seeds and toast lightly in a dry pan for a few minutes.  Use a medium heat and constantly move them so they do not burn.  Transfer the seeds to a mortar and pestle and pound to a fine dust.&lt;br /&gt;&lt;p&gt;Add the dry spices to the onions and cook for a couple of minutes.&lt;br /&gt;&lt;p&gt;Add the roasted chillies and sweet potato and mash them together.&lt;br /&gt;&lt;p&gt;Add all of the liquid before it starts to burn, together with the remaining ingredients and simmer for at least 30 minutes on a very low heat.&lt;br /&gt;&lt;p&gt;Use a stick blender to thoroughly blend everything into as fine a liquid as it's capable of (i'm using a Tiffany hand-me-down and it struggles but works).&lt;br /&gt;&lt;p&gt;Bottle.&lt;br /&gt;&lt;h3&gt;Results&lt;/h3&gt;&lt;p&gt;This tastes really nice so far.  Very subtle, slightly sweet and nutty flavour with a lip-burning kick to follow up which seems to multiply with every subsequent taste.&lt;br /&gt;&lt;p&gt;I didn't think it was going to end up hot enough after adding the liquid I needed to make it thin enough, so I threw in another half-dozen chopped habaneros when it was nearly done, cooked them for a while and re-blended.  This might be the last hot sauce I make for a while too, so I wanted to make sure it counted!&lt;br /&gt;&lt;p&gt;I'd probably rate this 7.5 out of 10 for heat.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-3ViJoQ8v0QI/Tb0Oku6ZffI/AAAAAAAAASI/oXixFfnrGbI/s1600/sauce-2.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://3.bp.blogspot.com/-3ViJoQ8v0QI/Tb0Oku6ZffI/AAAAAAAAASI/oXixFfnrGbI/s400/sauce-2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5601649535749815794" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update&lt;/b&gt; So 6 months on, this is definitely my favourite sauce.  I think it could use a bit more heat but the flavour goes with lots of things - i sometimes use it instead of butter on sandwiches, for dipping chips (it gets hot!), or to add a pleasant kick to pasta, steak, or a snag.  Please comment if you give it a go ...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-8909230903561202246?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/8909230903561202246/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=8909230903561202246' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/8909230903561202246'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/8909230903561202246'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/05/hot-sauce-2-sweet-arson.html' title='Hot Sauce #2 - Sweet Arson'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-snjt8hHwAVo/Tb0O43V3BfI/AAAAAAAAASQ/0Gme7dA7I-c/s72-c/potato-0.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-4287587984972416129</id><published>2011-05-01T16:38:00.006+09:30</published><updated>2011-05-01T17:20:13.389+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='cooking'/><title type='text'>Hot Sauce #1 - Toxic Lime</title><content type='html'>Back with another sauce - this time I thought i'd try something of a Thai theme: lime, kaffier lime, lemongrass, and fresh coriander.  I just grabbed a few things from the garden and the pantry that seemed to match and came up with my second sauce, Sauce #1 - Toxic Lime.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-vOzVWKCUbak/Tb0HAdtDSCI/AAAAAAAAASA/6Y4Rj2rVZjQ/s1600/sauce-1.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 266px;" src="http://4.bp.blogspot.com/-vOzVWKCUbak/Tb0HAdtDSCI/AAAAAAAAASA/6Y4Rj2rVZjQ/s400/sauce-1.jpg" alt="" id="BLOGGER_PHOTO_ID_5601641216073746466" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;h3&gt;Ingredients&lt;/h3&gt;&lt;table cellspacing="4"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="right"&gt;350g&lt;/td&gt;&lt;td&gt;Apples, cored.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;60g&lt;/td&gt;&lt;td&gt;Fresh coriander, including root and stem.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;70g&lt;/td&gt;&lt;td&gt;Fresh lemongrass.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;40g&lt;/td&gt;&lt;td&gt;Fresh ginger.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;12g&lt;/td&gt;&lt;td&gt;Fresh lemon basil leaves.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;2&lt;/td&gt;&lt;td&gt;Kaffir lime leaf.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;1&lt;/td&gt;&lt;td&gt;Kaffir lime zest.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;2&lt;/td&gt;&lt;td&gt;Limes, juice and zest.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;200g&lt;/td&gt;&lt;td&gt;Cayesan chillies (large fleshy chillies)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;100g&lt;/td&gt;&lt;td&gt;Habaneros (about 15)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;1/2 cup&lt;/td&gt;&lt;td&gt;Malt vinegar.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;30g&lt;/td&gt;&lt;td&gt;Palm sugar.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;1/2 cup&lt;/td&gt;&lt;td&gt;Cane Sugar.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;1 cup&lt;/td&gt;&lt;td&gt;Water.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;1 tsp&lt;/td&gt;&lt;td&gt;Citric Acid.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;2 tbl&lt;/td&gt;&lt;td&gt;Mushroom soy sauce (Happy Boy Brand)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;100ml&lt;/td&gt;&lt;td&gt;Olive oil.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;h3&gt;Method&lt;/h3&gt;&lt;p&gt;Cut the ginger, lemongrass and lime leaf into fine slices and then put into a food processor and process until finely chopped.  Add all of the other fresh ingredients (everything until and including the chillies) and blend thoroughly into a fine paste.&lt;br /&gt;&lt;p&gt;Place into a medium sized saucepan on a low heat and add the liquids and bring to a slow simmer.&lt;br /&gt;&lt;p&gt;Stir in the sugar and simmer everything slowly 'until done' - 30+ minutes.&lt;br /&gt;&lt;p&gt;Use a stick blender to liquefy everything together as much as possible.&lt;br /&gt;&lt;p&gt;Bottle.&lt;br /&gt;&lt;h3&gt;Results&lt;/h3&gt;&lt;p&gt;This has a slightly strange flavour - despite lime and chilli seeming a good match I've never really found something where it truly 'worked'.  Perhaps the kaffier lime zest is a little overpowering, as is the coriander.  I will have to leave it for a bit and try it with some suitable food (might go well with pork?).&lt;br /&gt;&lt;p&gt;It definitely has a lime flavour to it though, which is what I was trying to achieve so at this point i'd call it a success.  And it smells wonderful.  It's probably about a 7 out of 10 for heat as with the previous recipe.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-4287587984972416129?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/4287587984972416129/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=4287587984972416129' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4287587984972416129'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4287587984972416129'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/05/hot-sauce-1-toxic-lime.html' title='Hot Sauce #1 - Toxic Lime'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-vOzVWKCUbak/Tb0HAdtDSCI/AAAAAAAAASA/6Y4Rj2rVZjQ/s72-c/sauce-1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-3716964688914940337</id><published>2011-04-28T08:54:00.006+09:30</published><updated>2011-05-01T09:00:31.201+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Simple padding trick</title><content type='html'>For a few algorithms I have, I end up with data which is 32 elements wide operated on by 16x16 threads.  This ends up with 100% local memory contention if mapped directly to the local memory since every 16th thread aliases to the same bank.&lt;br /&gt;&lt;br /&gt;Although this can be addressed with a 16 word padding this is wasteful of precious local memory which might mean the code can't run in parallel to the extent it might otherwise, or simply cannot fit.&lt;br /&gt;&lt;br /&gt;A simple trick which still keeps the addressing quite simple is to shift every odd line to the second half of the data which is then offset by 16 words.  In effect bit 0 of the y address is shifted to the top of the addressing index, with an offset.&lt;br /&gt;&lt;br /&gt;For example, if a kernel is working on a 16x16 region of memory but requires some data either side of the target tile, it might do something like:&lt;br /&gt;&lt;pre&gt;   local float ldata[32*16];&lt;br /&gt;   int lx = get_local_id(0);&lt;br /&gt;   int ly = get_local_id(1);&lt;br /&gt;&lt;br /&gt;   int i = lx + ly * 32;&lt;br /&gt;&lt;br /&gt;   // load data&lt;br /&gt;   ldata[i] = ..read data block 8 to the left ...;&lt;br /&gt;   ldata[i+16] = ..read data 8 to the right...;&lt;br /&gt;   barrier(CLK_LOCAL_MEM_FENCE);&lt;br /&gt;&lt;br /&gt;   // work using i+8 as the centre pixel for this thread&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;By only changing the calculation of &lt;code&gt;i&lt;/code&gt; and padding the storage with only 16 words, the contention is easily removed without changing any other code:&lt;br /&gt;&lt;pre&gt;   local float ldata[32*16+16];&lt;br /&gt;   ...&lt;br /&gt;&lt;br /&gt;   int i = lx + ( ly &amp;gt;&amp;gt; 1 ) * 32 + (32*8+16)*(y &amp;amp; 1);&lt;br /&gt;&lt;br /&gt;   ...&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Assuming one is only working in the X direction, for Y the addressing is slightly more complex of course.  But this could come at no extra run-time cost once the loops are unwound.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-3716964688914940337?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/3716964688914940337/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=3716964688914940337' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/3716964688914940337'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/3716964688914940337'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/04/simple-padding-trick.html' title='Simple padding trick'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-5815499217301229479</id><published>2011-04-27T19:05:00.003+09:30</published><updated>2011-04-28T07:45:21.764+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><title type='text'>Transpose Is Your Friend</title><content type='html'>With graphics programming a lot of algorithms can be split into separate X and Y passes.  This generally works particularly well in the X case but in Y you can hit issues with memory (or processor) locality which can have a big impact on the algorithm.&lt;br /&gt;&lt;br /&gt;But the GPU texture cache is block oriented rather than line oriented so both X and Y oriented algorithms can be implemented equally (in)efficiently if you store data in images.  However, once in local memory you're effectively back to line-oriented access ... (if you want to preserve your sanity whilst working out the memory addressing to efficiently access the banked memory).&lt;br /&gt;&lt;br /&gt;The trick is just to transpose the data on read and write, and always work in the X direction locally.  It also means the X and Y working code is often identical.  This can be done just within the local work-group, but for 2D workgroups one has the added complication that work units are allocated in row-major order, i.e. in X first.&lt;br /&gt;&lt;br /&gt;The simple solution is just to transpose the global X and Y work-size as well, and simply swap the result of get_global_id(0) and get_global_id(1) when reading or writing the images.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-5815499217301229479?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/5815499217301229479/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=5815499217301229479' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5815499217301229479'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5815499217301229479'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/04/transpose-is-your-friend.html' title='Transpose Is Your Friend'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-5131969883089146026</id><published>2011-04-26T09:56:00.006+09:30</published><updated>2011-09-14T16:41:30.117+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='opencl'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>Parallel Prefix Sum</title><content type='html'>&lt;b&gt;14/9/11: added a further paragraph on additional thoughts&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Since coming across the parallel prefix sum a couple of weeks ago, a lot of things I need to solve seem to fall into the class of problems it is suited for within OpenCL on GPU platforms.  However after a lot of trial and error and experimentation i've found it is usually just slower - sometimes by quite a margin.&lt;br /&gt;&lt;br /&gt;In short, it takes advantage of the very high speed local memory ('LS') and parallelism to compute a commutative result from every element to every previous element in log2(n/2) steps.&lt;br /&gt;&lt;br /&gt;But with GPU's there are a couple of problems with it:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Even in the ideal case many of the threads are computing redundant data or not operating (depending how one chooses to implement it).&lt;br /&gt;&lt;li&gt;A synchronisation step is required after every single operation - which is usually something trivially simple.&lt;/ol&gt;&lt;br /&gt;The first leads to an over-commitment of threading resources which impacts the scalability as the overall job size increases.  And the second leads to very inefficient scheduling even on simple tasks, and a much heavier 'inner loop'.&lt;br /&gt;&lt;br /&gt;For example, I implemented a 5x5 maximum operation (for non-maximum suppression peak detection) using a separate X and Y operation (I realise a 5-tap test doesn't really exercise the log2(N) nature of the algorithm much, but more on that later).&lt;br /&gt;&lt;br /&gt;My first implementation uses a 16x16 workgroup size (after much experimentation this seems to be the generally best workgroup size for operating on images on my hardware - it leads to an occupancy of 1 and seems to be a good fit for the texture cache configuration).  Each local workgroup reads a 16x16 area into LS and then 16 threads work together on each row of result.  It only does a couple of 'prefix sum' steps because I only need the result from 4 samples, and I do the last one manually.  I use the trick of offsetting the starting point so no thread requires any conditional execution.  Finally, it only produces 12 valid results for the 16 inputs since you need overlap.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-509OI4hGGng/TbYXeggaOpI/AAAAAAAAARw/MyZWfYgb4GE/s1600/parallel-max.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 136px;" src="http://1.bp.blogspot.com/-509OI4hGGng/TbYXeggaOpI/AAAAAAAAARw/MyZWfYgb4GE/s400/parallel-max.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5599688999571176082" /&gt;&lt;/a&gt;&lt;p align="center"&gt;Figure 1: Steps taken for parallel maximum calculation.  Only the workings of 4 of the 16 threads are shown.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;Because it only generates 12 results it needs to be run 16/12 times the width of the image.  This runs in about 65uS on the test data set.&lt;br /&gt;&lt;br /&gt;Then I tried a version which reads 2x 16x16 blocks into memory so it can produce all 16 results in one go - unfortunately i've lost the timings and I can't be bothered to re-run it, but i'm fairly confident it wasn't terribly impressive.&lt;br /&gt;&lt;br /&gt;Finally I implemented a very simple version which just reads in 2 16x16 blocks into local memory, and then does the operation on the 2 pixels before and 2 pixels after the current location (i.e. an unrolled loop).  This was somewhat quicker - 48uS, or about 25% faster.&lt;br /&gt;&lt;br /&gt;I didn't bother trying it for the parallel sum case, but I also tried larger window sizes for the simple version - and even at 9 it is still 20% faster than the 5X case for the parallel sum version.  And this is for the single channel case - for a 4 channel image you have a 4x LS load, which is not required when it is calculated in registers.&lt;br /&gt;&lt;br /&gt;Intuition would tell you that increasing the data-size will eventually lead to a case where it out-performs the simple cases.  But the wider the data being calculated the more threads you require and this reduces the opportunity for hiding latencies by letting the GPU schedule independent workgroups.  The local store can also be a factor since it too can limit how wide you can go.&lt;br /&gt;&lt;br /&gt;I also applied it to (larger) problems where you're only interested in the final result.  Because branching is expensive it seems on paper that it doesn't matter if you generate many redundant results since the overall number of steps is much lower - e.g. a 16x16 summation only takes 7 steps rather than 256.  Although in reality you break it up into 16 strips 1xwide so it's only 32 steps (16 lots of 16 plus 1 of 16).  And it only takes 16 threads rather than 256, so you can execute 16x as many at once for a given number of threads.  And you don't need any local store.&lt;br /&gt;&lt;br /&gt;I found in all cases it was (sometimes much) faster to split it into 16x1 loops which operate on 16 data items, and then have a single thread complete the partial sums.&lt;br /&gt;&lt;br /&gt;And finally the one case where it seemed to have traction - calculating an integral image where every pixel has it's value added to every pixel to the right/below it - did seem faster than another implementation I had.  But that initial implementation was before I had discovered other performance improvements so I suspect I could probably do better if i had another go.  To satisfy my curiosity I just tried implementing part of it using a looping implementation and with little effort managed to beat or at least equal the prefix-sum version.  Incidentally both require splitting the problem into smaller parts and then a final step to 'fix' the integral image - for the parallel prefix sum version you run out of local store or threads, and in both cases you need the parallelism to help improve the GPU efficiency.&lt;br /&gt;&lt;h2&gt;Further Thoughts 14/9/11&lt;/h2&gt;Since writing this a lot more water has flowed under the bridge and I have a few more thoughts to add.&lt;br /&gt;&lt;br /&gt;Having a smaller rather than larger work-size is important as I alluded to above: but larger problems can be made smaller by storing intermediate values in registers and then only sharing the work to reduce a smaller-multiple of the dataset.  e.g. storing 4 registers locally allows 4x as much data to be 'processed' using the same amount of shared-work (and shared memory too) - which is the expensive stuff.&lt;br /&gt;&lt;br /&gt;Since I was sticking to spec I have never tried removing the barriers and relying on the hardware's behaviour.  So I don't know how much difference this makes: the technique in the paragraph above is even more useful then, if you can reduce the problem to the 64 elements required to benefit from the hardware characteristics.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://code.google.com/p/socles/source/browse/trunk/socles/src/au/notzed/socle/image/IntegralImage.java"&gt;Integral Image&lt;/a&gt; code in socles uses these techniques, and in this case the parallel prefix sum was a (small) win.  And IMHO is a fairly tight bit of code.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-5131969883089146026?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/5131969883089146026/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=5131969883089146026' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5131969883089146026'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/5131969883089146026'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/04/parallel-prefix-sum.html' title='Parallel Prefix Sum'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-509OI4hGGng/TbYXeggaOpI/AAAAAAAAARw/MyZWfYgb4GE/s72-c/parallel-max.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-4812011474791102243</id><published>2011-04-26T09:35:00.007+09:30</published><updated>2011-05-01T16:38:13.233+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='cooking'/><title type='text'>Hot Sauce #0 - Incendiary Tomato</title><content type='html'>&lt;b&gt;Update&lt;/b&gt; &lt;em&gt;I finally gave it a name and labelled the bottles - "Incendiary Tomato" sounded like a good name.  I've also downed 1/4 of a bottle of this stuff with cheese and crackers or on my dinner, and it's really bloody nice!&lt;/em&gt;&lt;br /&gt;&lt;p&gt;I have a chilli or two from the garden and have been making a few hot(ish) sauces - tomato sauce with a few chillies thrown in to give it a pleasant kick, a hot sweet chilli and ginger sauce, and an apple, ginger, and chilli chuntney with much more of a kick.  The latter is pretty nice, I took a recipe that asked for 500g of capsicum and 6 chillies and just used 500g of chillies.  I think it's fairly medium-warm on the heat scale but a mate can't stomach it.  I also tried a green fermented sauce (unfortunately `killed' that with too much vinegar) and have some red fermented chillies i'm not sure what to do with yet.&lt;p&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-xWEBfSEp4Z0/TbYOCUGEnOI/AAAAAAAAARo/1AqElGHPY4Y/s1600/habanero-0.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://2.bp.blogspot.com/-xWEBfSEp4Z0/TbYOCUGEnOI/AAAAAAAAARo/1AqElGHPY4Y/s400/habanero-0.jpg" border="0" alt="Habanero Chillies in Blue Bowl" id="BLOGGER_PHOTO_ID_5599678619598494946" /&gt;&lt;/a&gt;&lt;br /&gt;The habaneros are hitting their fruiting stride at the moment so I thought i'd try a hand at something with a bit more bite and less sweetness.  I also wanted to avoid the vinegar flavour which had overpowered my last effort so I based the acid on lime and citric acid.  I found a recipe that looked a bit dogey - based on a tin of tomatoes - but I had some tomatoes i had to use so I started with that and then spiced it up beyond recognition.  For once I recorded everything I put into it, so i thought i'd share ... I pretty much made it up as I went and mixed a few ideas I really wanted to try separately but once it was in the pot there was no going back.&lt;br /&gt;&lt;h3&gt;Ingredients&lt;/h3&gt;&lt;table cellspacing="4"&gt;&lt;tr&gt;&lt;td align="right" valign="top"&gt;700g&lt;td&gt;Roasted whole tomatoes.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;12&lt;td&gt;Ripe Habanero chillies.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1&lt;td&gt;Lime, juice and zest.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1&amp;nbsp;tsp&lt;td&gt;Citric acid.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1&amp;nbsp;tsp&lt;td&gt;White pepper (see below).&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1&amp;nbsp;tsp&lt;td&gt;Black pepper.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;8&lt;td&gt;Cloves&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1/2&amp;nbsp;tsp&lt;td&gt;All spice.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1/4&amp;nbsp;tsp&lt;td&gt;Ground mustard.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1/2&amp;nbsp;tsp&lt;td&gt;Ground ginger.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1&amp;nbsp;tsp&lt;td&gt;Salt&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1&amp;nbsp;tbs&lt;td&gt;Palm Sugar.&lt;br /&gt;&lt;tr&gt;&lt;td align="right"&gt;1&amp;nbsp;tbs&lt;td&gt;Sugar.&lt;br /&gt;&lt;/table&gt;&lt;h3&gt;Method&lt;/h3&gt;&lt;p&gt;Cut the chillies into small pieces.  Pound the whole dry spices in a mortar and pestle.  Break up the palm sugar (I only did 1 tablespoon because i was too lazy to do another and used plain sugar when I needed more sweetness).&lt;br /&gt;&lt;p&gt;Put everything in a pot and simmer for about an hour - until the chillies are soft.&lt;br /&gt;&lt;p&gt;Use a (stick) blender to puree everything together.  Don't splash your eyes.&lt;br /&gt;&lt;p&gt;Bottle in sterlised jars whilst hot.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-fbHTVJc56JE/TbY7zyzIcoI/AAAAAAAAAR4/i1lOx7qUe5s/s1600/habanero-1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 368px;" src="http://4.bp.blogspot.com/-fbHTVJc56JE/TbY7zyzIcoI/AAAAAAAAAR4/i1lOx7qUe5s/s400/habanero-1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5599728947677393538" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;h3&gt;Results&lt;/h3&gt;&lt;p&gt;I'd probably rate it about a '7/10' for heat, where tobasco sauce is 5 (although it's been some time since I had any).  Although the heat lingers much longer, and builds up the more you have.  A teaspoon would be enough for a nicely burning steak.&lt;br /&gt;&lt;p&gt;I'll have to let it sit in the bottle a while to finalise the flavours but for now I think it has a bit too much pepper which over-powers the chillies; although it's a bit hard to tell since the bite over-powers the flavour on the initial taste.  Possibly more mustard, ginger, and all-spice would work too, even lime juice.  Maybe a little vinegar wouldn't hurt?  I think the sugar level is about right (for my palette), as is the saltiness - both of which are required to bring out the flavours.  It's basically a very strong, extremely hot tomato sauce.&lt;br /&gt;&lt;p&gt;I don't know how it'll keep yet - the acid in the tomatoes, lime and added citric acid should hopefully be enough since I wont be finishing the litre or so I made too quickly and this is probably one sauce I can't share with friends.&lt;br /&gt;&lt;p&gt;And i've got more chillies and ideas to try so it might be competing for condiment time.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-4812011474791102243?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/4812011474791102243/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=4812011474791102243' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4812011474791102243'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/4812011474791102243'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/04/hot-sauce-0.html' title='Hot Sauce #0 - Incendiary Tomato'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-xWEBfSEp4Z0/TbYOCUGEnOI/AAAAAAAAARo/1AqElGHPY4Y/s72-c/habanero-0.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9054609445298212106.post-222110371009250882</id><published>2011-04-16T21:11:00.004+09:30</published><updated>2011-04-16T21:39:12.450+09:30</updated><category scheme='http://www.blogger.com/atom/ns#' term='cooking'/><title type='text'>Apple Pie</title><content type='html'>Back to regular programming after the last rant ...&lt;br /&gt;&lt;br /&gt;I ended up with some ageing/damaged fruit and after sitting there degrading for a week I finally pulled a finger out and made my first apple pie with the recoverable fruit.  I'm not a big dessert person - I can't really remember having apple pie since I was a kid (although i'm sure I've had some here and there) - and it was never like this.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-7aTT1k1d-1I/TamA3sBIW5I/AAAAAAAAARg/diD8xYG_4mQ/s1600/apple-pie-0.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 266px;" src="http://2.bp.blogspot.com/-7aTT1k1d-1I/TamA3sBIW5I/AAAAAAAAARg/diD8xYG_4mQ/s400/apple-pie-0.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5596145706181614482" /&gt;&lt;/a&gt;&lt;br /&gt;It worked out pretty well in the end; it tastes at least as good as it looks and quite possibly better.  I didn't have any lemon so used citric acid with some water to keep the sliced apples from going brown - and I think that gave it a bit of extra tang.  I threw in a few more spices too for good measure (vanilla, extra whole cloves, more cinnamon).  Even the crust on the bottom was a bit crunchy - I tried a tip I saw on the Hairy Biker's a coupe of days ago, sprinkling the base with semolina to absorb some extra moisture.  Although having never done it before I don't know if it made any difference. I &lt;a href="http://www.taste.com.au/recipes/5008/apple+pie"&gt;used this recipe&lt;/a&gt; - although I had a hell of a time with the pastry.  I just can't seem to get the hang of working with short pastry although it usually turns out ok.&lt;br /&gt;&lt;br /&gt;I've been pretty busy hacking some OpenCL stuff of late - so busy i've barely left a keyboard.  More on that soon I imagine, once I get a bit of sleep and remember it is just what i've done - and if any of it is worth sharing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9054609445298212106-222110371009250882?l=a-hackers-craic.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://a-hackers-craic.blogspot.com/feeds/222110371009250882/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9054609445298212106&amp;postID=222110371009250882' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/222110371009250882'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9054609445298212106/posts/default/222110371009250882'/><link rel='alternate' type='text/html' href='http://a-hackers-craic.blogspot.com/2011/04/apple-pie.html' title='Apple Pie'/><author><name>NotZed</name><uri>http://www.blogger.com/profile/09469760565180198154</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-7aTT1k1d-1I/TamA3sBIW5I/AAAAAAAAARg/diD8xYG_4mQ/s72-c/apple-pie-0.jpg' height='72' width='72'/><thr:total>0</thr:total></entry></feed>
