Saturday, 31 May 2014

that 'graphical' demo

Ended up coding that 'graphical' demo. Just another mandelbrot thingo ... not big on the imagination right now and I suppose it demonstrates flops performance.

This is using JavaFX as the frontend - using the monocle driver as mentioned on a previous post. I'm still running it via remote X11 over ssh.

I have each EPU core calculating interleaved lines which works quite well at balancing the scheduling - at first i tried breaking the output into bands but that worked very poorly. The start position (not shown, but is (-1.5,-1.0) to (0.5,+1.0) and 256 maximum iterations) with 16 cores takes 0.100 seconds. A single-threaded implementation in Java using Oracle JDK 1.8 on the ARM takes 1.500 seconds, i'm using the performance mode. A single EPU core takes a little longer at 1.589s - obviously showing near-perfect scaling of the trivially paralellisable problem here even with the simple static scheduling i'm using.

For comparison my kaveri workstation using the single-core Java takes 0.120 seconds. Using Java8 parallel foreach takes that down to 0.036 seconds (didn't have this available when I timed the ARM version).

Details

The IPC mechanism i'm using is an 'ez_port' with a on-core queue. The host uses the port to calculate an index into the queue and writes to it's reserved slot directly, and so then the on-core code can just use it when the job is posted to the port.

The main loop on the core-code is about as simple as one can expect considering the runtime isn't doing anything here.

struct job {
        float sx, sy, dx, dy;
        int *dst;
        int dstride, w, h;
};

ez_port_t local;
struct job queue[QSIZE];

int main(void) {
        while (1) {
                unsigned int wi = ez_port_await(&local, 1);

                if (!queue[wi].dst) {
                        ez_port_complete(&local, 1);
                        break;
                }

                calculate(&queue[wi]);

                ez_port_complete(&local, 1);
        }

        return 0;
}

This is the totality of code which communicates with the host. calculate() does the work according to the received job details. By placing the port_complete after the work is done rather than after the work has been copied locally allows it to be used as an implicit completion flag as well.

The Java side is a bit more involved but that's just because the host code has to be. After the cores are loaded but before they are started the communication values need to be resolved in the host code. This is done symbolically:

        for (int r = 0; r < rows; r++) {
                for (int c = 0; c < cols; c++) {
                        int i = r * cols + c;

                        cores[i] = wg.createPort(r, c, 2, "_local");
                        queues[i] = wg.mapSymbol(r, c, "_queue");
                }
        }

Then once the cores are started the calculation code just has to poke the result into the processing queue on each core. It has to duplicate the struct layout using a ByteBuffer - it's a bit clumsy but it's just what you do on Java (well, unless you do some much more complicated things).

        int N = rows * cols;

        // Send job to queue on each core directly
        for (int i = 0; i < N; i++) {
                int qi = cores[i].reserve(1);

                // struct job {
                //      float sx, sy, dx, dy;
                //      int *dst;
                //      int dstride, w, h;
                //};
                ByteBuffer bb = queues[i];

                bb.position(qi * 32);

                // Each core calculates interleaved rows
                bb.putFloat(sx);
                bb.putFloat(sy + dy * i);
                bb.putFloat(dx);
                bb.putFloat(dy * N);

                // dst (in bytes), dstride (in ints)
                bb.putInt(dst.getEPUAddress() + i * w * 4);
                bb.putInt(w * N);

                // w,h
                bb.putInt(w);
                bb.putInt(h - i);
                bb.rewind();

                cores[i].post(1);
        }
The post() call will trigger the calculation in the target core from the loop above.

Then it can just wait for it to finish by checking when the work queue is empty. A current hardware limitation requires busy wait loops.

        // Wait for them all to finish
        for (int i = 0; i < N; i++) {
                while (cores[i].depth() != 0) {
                        try {
                                Thread.sleep(1);
                        } catch (InterruptedException ex) {
                        }
                }
        }

These two loops can then be called repeatedly to calculate new locations.

The pixels as RGBA are then copied to the WritableImage and JavaFX displays that at some point. It's a little slow via remote X but it works and is just a bit better than writing to a pnm file from C ;-)

I've exhausted my current ideas for the moment so I might drop out another release. At some point. There are still quite a lot of issues to resolve in the library but it's more than enough for experimentation although i'm not sure if anyone but me seems remotely interested in that.

minimal memory allocator n stuff

Over the last couple of days i've been hacking on the ezesdk code - original aim was to get a graphical Java app up which uses epiphany compute. But as i'm just about there i'm done for a bit.

Along the way I realised I pretty much needed a memory allocator for the shared memory block because I can't use static allocations or fixed addresses from Java. So I implemented a very minimal memory allocator for this purpose. It's small it enough it could potentially be used on-core for on-core memory allocations but i'm not look at using it there at this point.

It uses first-fit which I surprisingly found (some profiling a few years ago) was one of the better algorithms in terms of utilisation when memory is tight. The lowest level part of the allocator requires no overheads for memory allocations apart from those required for maintaining alignment. So for example 2 32xbyte allocations will consume 64-byte of memory exactly. Since this isn't enough to store the size of the allocation it must be passed in to the free function. I have a higher level malloc like entry point also which has an 8-byte overhead for each allocation and saves the size for the free call.

The only real drawback is that it isn't really very fast - it does a memory-order scan for both allocations and de-allocations. But as it is basically the algorithm AmigaOS used for years for it's system allocator it has been proven to work well enough.

So once I had that working I moved all the loader code to use it and implemented some higher level primitives which make communicating with cores a bit easier from host code.

I just ran some trivial tests with the JNI code and I have enough working to load some code and communicate with it - but i'll come up with that graphical demo another day. A rather crappy cold/wet weekend though so it might be sooner rather than later.

Wednesday, 28 May 2014

parallella ezesdk 0.1

As seems to be the habit lately I had a bunch of code sitting around on the hard drive for too long doing nothing so i packaged it up incase someone else finds it interesting. I should have some free time throughout June so doing this might also be a kick to restart development but I wouldn't be holding my breath on that one.

It's over on the ezesdk homepage I just whipped up.

It's all pretty much still subject to change but I think i've nailed down a pretty nice and compact core-side library api and the low-level host library as well. I had intended to include proper documentation and Java bindings by this point but ... well I didn't get to it.

Features of note is that it includes a functional printf, on-core barrier, async (interrupt-driven) dma routines, and 2d dma entry points; some of which has been discussed in more detail on a hacker's craic.

Monday, 26 May 2014

Chocolate cake with hazelenuts

So I just finished the my last bit of work and the office/team has a thing about people bringing cakes for various events so I baked a cake to take along as a good-bye and partly just to say that the people weren't the reason I was refusing any potential continuation of the work.

I was going to go with an apple cake i've made a couple of times which usually turns out pretty well but there was an apple cake the week before for someone's birthday and and since i've been trying to get a good chocolate cake under my belt for a while I thought i'd risk it - if it failed i'd just leave it at home. I have been hoeing through some mixed raw nuts lately and the pack included hazelnuts - which are pretty foul raw - so i'd been saving them up wondering what to do with them and it seemed like an obvious match. I couldn't find quite the type of recipe I was after (either they were all about gluten free-ness, making super-heavy cakes or some kids-birthday Nutella-based thing) so I tried adjusting one I had for another chocolate cake and it worked rather well ...

  • 125g butter
  • 155g brown sugar
  • 3 eggs
  • 125g self-raising flour
  • 1/4 teaspoon bicarb soda
  • 45g cocoa powder
  • 125g roasted and blended hazelnuts (aka meal)
  • 190ml milk
  • 125g dark chocolate (I used 40% solids)
  • about 1 short black of strong espresso coffee (I use a 4-cup stove-top espresso maker and half-filled the water with a full load of grounds)
  1. Pre-heat oven to 160C.
  2. Melt the chocolate with the coffee using a steam bath and while it's cooling ...
  3. Cream butter and sugar.
  4. Blend in eggs one at a time until fully mixed.
  5. Mix in flour, bicarb, cocoa powder, hazelnuts, milk, and melted chocolate until just combined.
  6. Pour into greased/lined 22-24cm spring-form cake tin and level off.
  7. Cook for 1 hour and 15 minutes 'or until done'.

From the original recipe I replaced 30g of plain flour with all the hazelnuts I had, added the coffee when melting the chocolate to make it easier to mix in (doesn't just go hard/curdle, and it just tastes better), and adjusted a couple of other things slightly to suit what I had on hand. Although my oven works really well for everything else i've had a lot of trouble with cakes not cooking enough but this time I tried turning off the fan-forced fan and it was right on the money (perhaps slightly over, but not enough to matter).

It managed to survive the ride in my pannier bags and generally impressed. Probably the best cake i've ever made actually, although the bar has been until this point, rather low.

Saturday, 24 May 2014

short hsa on kaveri note

I finally got around to upgrading to the latest hsa kernel and so far it seems as though hsa now continues to work after a suspend. Well i've suspended and resumed the machine once and the one of the aparapi-lambda tests continues to function with hsail.

Although this machine reboots rather fast it was just an inconvenience that pretty much put a stop to me playing around with hsa. I think there were other reasons too actually but I guess it's one less at least.

It's the 'v0.5' tag of git://people.freedesktop.org/~gabbayo/linux which I built from source rather than use the ubuntu images in Linux-HSA-Drivers-And-Images-AMD (although i grabbed the firmwares from there).

...

I've got 1 day left on my current bit of work, then another break, and then i'm not sure what i'm going to do. There is pretty much perpetually on-going 'bits of work' if I want it but i'm considering not continuing for various reasons (and have been for a while). Have to weigh that against trying to find some interesting work though. Ho hum.

Friday, 16 May 2014

Selfie

Had nothing better to do on my RDO but sit at the pub drinking jugs (translation for americans: rdo=rostered day off, jugs=pitchers). Took the pic with my cheap tablet on the front-facing camera so it's a bit ordinary but the subject matter hardly requires better ...

A very unseasonably warm 25C+ late autumn day, bloody wonderful. Bit of eye candy but I was well occupied reminiscing about happier timers and better company. Many people I knew/know and miss due to trans-state movements, emigration, and other. All the rest were busy today - work, pah.

Everyone moved on but I just got old and fat and grey. Yeah it's the same shirt I wore out in Bangalore but i've been working on the chops (may as well look the part if i'm going to be eccentric). Must be a bit of the Celtic blood showing itself there.

Saturday, 10 May 2014

An untitled whine.

Maven is the wrong tool

So i've been working on a team project for the last few weeks. The people are decent but the processes are insane and the tool and technology choices are ... odd.

One of the tools is of course maven, which for some baffling reason has become quite popular in the java world. It's really a pretty horrid bit of software - very slow, unreliable (essentially you're forced to 'make clean; make all' every time and even then that doesn't guarantee a repeatable build), obtuse error messages which don't explain what's going on, plugin system with terrible discovery, shitty documentation, a dependency system which creates more problems than it solves. Like many java technologies it seems to have been created to sell books, training, and 'enterprise' versions rather than solve an existing need; and yet somehow created an evangelical following of essentially the blind leading the blind.

But you know, a lot of software has faults - although a build system which cannot create repeatable (incremental!!) builds is less than worthless in my mind. I've only been there 3x4-day weeks and probably spent 2 days waiting for things to run which makes it a very expensive bit of junk (as far as I can tell, so has every other developer).

However the primary problem with maven is that like git it is a configuration management tool.

But unlike git which can at least be (ab)used to become a developer tool, maven just doesn't have the required facilities to start with.

Subversion wtf?

I thought i'd see if the HSA stuff had been updated so I started updating my local checkouts yesterday. Apart from git being a bit of a git, subversion seems to have added 'interactive' bullshit now. So rather than just dump conflict markers in a file and show them it prompts you for what to do.

Yeah, no thanks.

Not sure I could be bothered building it right now anyway even if it wasn't still checking out the c++ snot from gcc.

Atom, wtf?

I came across a link to the "next generation" editor from github - Atom. Really pretty baffling why they're even working on some shitty editor, using 'web' technologies and then turning it into a macintosh application.

But really what surprised me most was the number of people saying things like "it's slow and it's buggy and even though it's exactly like some existing programmers editor I paid for, i'm going to be using this from now on".

What? Why? Do they think there'll be some sort of gold-rush on plugins? Why would they be willing to interfere with their productivity just to be part of something new?

Amazing the strange decisions people make based on hype.

And people still buy text editors? There's dozens of free ones out there and many many of them are particularly excellent. Maybe it's a macintosh thing. Like on the Amiga little developers can make niche products and get people to buy them.

Apparently the buzz is because it's extensible using an 'easy to use' language, whilst still being pretty and newbie friendly. Yeah I dunno, javascript is pretty shit when used as an app language.

Agile, the antonym of development methodologies

I've worked out how to treat everything that comes out of 'Agile' development. Just put a not in-front of each and every word of jargon from the agile bible (again another load of bullshit created to sell self-help books).

It's not a sprint, it's not a backlog item, and it's certainly not fucking agile in any dictionary-sense-of-the-word.

One can only imaging the authors of the 'best selling' book on the subject erroneously used the antonym section of the thesaurus when trying to generate their silly jargon rather than the synonym one.

A death in the family

Samantha finally died ... or what was left of her.

This is her second frame and the frame cracked in the same place - along the rear weld of the reinforcing strut behind the bottom bracket. The rear wheel is also new (the original just needs a couple of spokes and a re-true). I replaced all the cables when I changed the frame last time. The seat post needed changing when I changed frames, both brakes ended up being replaced (the springs on the arms broke off). At least one new chain - maybe two, and another rear cog cluster. So yeah, not much left of the original bike: front wheel, derailers, levers, handlebars, pedals, and forks.

This was originally a Specialized mountain bike I bought in Boston when I was working for Ximian in 2000, so i got a pretty good run out of it, and the replacement frame was only $150 (old but new stock). It's my shopping cart and commuter so even if i'm not doing as much distance as I am right now it's always had a regular weekly use and occasional heavy loads (like 10kg bags of flour, 25kg bags of potting mix, or a late-night dinky).

I have a Wheeler I bought even earlier (1998 or 1999?) which has been sitting outside with a broken rear wheel (fucking idiot at the bike shop on The Parade in Norwood over-tightened the spokes so much they pulled out of the rim - avoid, they're snobby wankers anyway) so over Easter I did a bit of a recondition on that. Gave it a good clean, sanded and touched up all the corrosion spots in the paintwork, cleaned and oiled the chain, then moved over the wheels, mudguards, pannier rack, lights, bottle cage, pump, and saddle from Sam.

The brand new rear wheel by this time was a little loose so I did a re-true on that and replaced a missing spoke on the front wheel and re-trued that (at least, as best I could), re-aligned the rear derailer and so on.

The spigot nut slips inside it's plastic case so I didn't really tighten the front wheel enough so that started working loose after a week. The axle was quite bent when I put it on (i'm not sure how that happened) so then the following weekend I decided to try to fix that as well - took out the bearing and re-greased, hammered the bent axle slightly straighter on a piece of railway line I use as a small anvil (trying to without damaging the thread), cleaned it up a bit and put it back on really tight.

Considering I haven't done any maintenance for years that isn't so bad, although I'm sure after all that something else is going to break next.

The Wheeler frame is built like a tank - oversized tubing and heavy reinforcing on the main welds. Unlike the other two frames there is no flexing whatsoever when I stand on the pedals; so that should last a few years yet. I mostly used the Specialized once I came back from Boston so it probably only had a couple of years use so far (it seemed lighter than the wheeler even though the wheeler had some better components).

Ahh I finally remembered the name I had for the Wheeler - Michelle. Just a running joke with some mates to name my bikes with female names.

Sewing Machine

I've been doing some very early riding of late, and together with a #2 haircut, a very well ventilated helmet, and some cold mornings ... it can get pretty chilly on the forehead (and the wingnuts). Cold here is around 10 degrees.

I was trying to find something I could use to keep some of the wind off and I came across an old knee warmer. The elastic is gone and the other knee got lost when I lent it to someone so it was convenient that it fits (if a bit snugly) over my head and ears. So I de-stitched the elastic (all 4 rows of strong polyester - certainly built well, I think it was an Australian made one) and brought out the sewing machine to just sew up the edge (hem?).

The machine just didn't seem to want to catch a stitch so I ended up pulling the whole bottom mechanism apart and trying to work out what was going on. In the process I finally learnt the magic that makes them work - I always wondered how the top-fed cotton was looped around the bottom-fed cotton. The mechanism is deceptively simple yet relies on precise alignment and some engineering of the needle shape. As the needle reaches it's lowest point, a rotating slot goes past and picks up the cotton that is sitting next to it and loops it around the bobbin feed. It relies on the needle being at the right place at the right time to fairly close tolerances and the flexibility of the cotton itself.

So after taking apart what I could and realising that nothing at all is adjustable I did a bit of an oil and put it back together and then I realised that the rotating spindle around the bobbin-cage has to be time-aligned with the rest of rotating parts ... but after some adjustments I got that back in the right spot.

Still didn't fucking work! If anything it was dropping even more stitches.

But then I looked closer at the needle I was using - and it had an oddly thin shape at the bottom I guessed it wasn't close enough to the rotating slot for the cotton to be picked up reliably. So I found another one, slotted it in, ... and away it went. I can't remember why I put that needle in now but it's clearly not made for this machine and at least i'll know where to look next time. It was with a box of shit the machine came with and must've come from another machine.

Actually because the material is so stretchy the stitching is hard to get right anyway, but I repaired some other things that needed sewing up and made a morning of it. After i'd cut it in half I realised I could have made a pretty creepy balaclava out of it but perhaps it's better I didn't ;-)

I think the machine is as old as me, Singer 248. It's built like something out of the steam age - the main body and platform is one big cast iron block and the drive shaft and cam-driven rockers are seemingly over-sized steel rods. It's got holes for pedal-driven belt but runs off a small motor. I was just about to throw it out so i'm pleased to find it was just a needle and I had an interesting morning learning how it worked. I couldn't see why it wouldn't be working otherwise - none of the parts seem particularly worn and it doesn't look like it's been used a lot.

Not that I have much to sew. Maybe I can make a slip-on cover for an e-book or tablet with the remainder of the knee warmer.

Update: Ok, so i got that wrong: i ran out of the cotton i was using and when i changed it it stopped catching again. I also found the adjustment screws and cams for the needle positioning. In the end the main problem was the bobbin cage timing, knew I should've recorded it's position before I took the belt off. After a few tries I managed to get it to catch on both sides of a zig-zag stitch although the stretchy material and needle were other factors. Fiddly, but I guess it occupied an afternoon ...

Cold and Wet

Another shitty cold wet weekend.

Probably wont bother rebuilding the HSA tools or doing any hacking (work and some poor sleep is completely wearing me out, and my eyes need a rest), might clean the kitchen, cook something, watch some footy if it's on and maybe crack a red.

I've only got a couple more weeks to go on this (really short) contract and then i'm not sure what's going to happen. I think the plan that others made was for me to continue on it for a few years but i've stated i'm not willing to work on the project further - although the location, technology, and team organisation is a contributing factor, there is a more serious and unrelated reason I can't mention.

I could survive on a few months/year for on-going work on previous tasks but whether that is acceptable is not up to me.

With the federal government seemingly hell-bent on turning Australia into some Victorian-era upstairs/downstairs nightmare of total class warfare (I just can't ... just can't .... fathom where they're headed with their current ideologically driven nonsense: a ruling class which takes all of the nation's wealth and works the underclass till death?) the whole work situation around here is only going to get tighter. Not that there is much call for parallel/performance programmers around here that I can tell.