A Hacker's Craic: February 2010

Saturday, 27 February 2010

Sick and tired of being a permanent beta-tester.

Sigh. I was really angry about this, but now I'm just disappointed.

I'm just sick and tired of being a permanent beta-tester, or even alpha-tester, for `linux distributions' and much of the software they use. That was never terribly cool, but it was entirely acceptable up until about the turn of the century. But today there's no excuse anymore, and really only one way to describe it, and that's bullshit. It just doesn't seem to be getting better either, and if anything seems to be taking a turn for the worse in the last few years (Notwork Manager, Puss Audio, KDE 4, EXT4, anything from f.d.o, and so on).

In the usual bullshit goal to add new features, stability and usability are being cast aside in the permanent quest for shiny. It certainly isn't confined to the linux world - just look at vista or apple - but they don't affect me personally so i couldn't care less.

The latest case to affect me being Grub 2 (apparently it's been in use for a while, but today is my first experience with it). I installed Ubuntu 9.10 on a fresh system, and it just doesn't boot. No problem ... go to rescue mode and fix it. Ahhh, what the feck is this mess? They've turned the boot-loader into a friggan `platform'. I don't want to have to learn a whole new over-engineered `script' format, together with each distribution's overly extensible frame-work designed to make it `usable' - for something that doesn't work anyway, and wont add anything most people need. It's just a bloody boot loader after-all. I know grub had some issues, but it just doesn't need a huge run-time extensible module system and sophisticated scripting platform to copy a disk image to ram and change the cpu's program counter to point to it. And particularly since they still seem to be short on developers - the last thing they need is to make it more complex.

It's a fundamental problem which starts at the project developers and filters it's way through to the distribution makers - but ultimately it is the distribution makers who are making the wrong choice. For starters, if basic things like sound, network, or booting don't work, they rightly bear the brunt of the anger. They are responsible for not just compiling a group of disparate projects and ensuring they work together, but that the application choices actually work. Distributions too often seem to confuse `stable' with `unmaintained' as well - the noisiest busiest projects get the attention, even if an alternative already does the same thing but isn't being actively developed any more since it doesn't need to be. And their choices can have pretty negative effects on projects; including a project 6 months before it is actually ready for production use can leave a sour taste in many users and tar it's image for years.

Project developers do need to take their share of the blame too. Sure nobody wants to maintain old software forever, but if they decide to drop support for and old product, they had better make sure the replacement works pretty well before pushing it for inclusion into distributions (or accept that nobody will use it till it does). Too often unwanted software is forced onto all users as a way to ensure it gets the testing required to make it a complete product, or worse, to ensure a competing project doesn't gain a footing (this is particularly problematic inside Linux where politics has started to blatantly undermine merit). As much as I dislike it, I realise this is part of Fedora's policy, so it can be excused to some extent, but most distributions actually promise a usable system from the start.

I guess i'll try Centos again - I tried it earlier but I couldn't get dual-screen working, and the network went all strange when I went to download decent drivers (the former not entirely their fault - bloody nvidia) ... hmm, maybe I shouldn't bother. And if that doesn't work easily I guess it'll have to be Fedora 10, or perhaps an earlier Ubuntu. At least they just worked as far as I need them (i.e. I don't need sound).

I suppose this strategy of avoiding the unstable shiny shit will work for a couple of years ... and hopefully by then the distributions will have got their fracken shit together, but somehow I just don't think that will happen.

So you might be wondering - just what are you doing about it then, you whiney prick?

Well for starters, there just isn't anything I can do about Linux or it's distributions - they are too big with their own culture and momentum - and way too much politics. At the most I might be able to write a small application or work on a larger one with other people - but that wouldn't have any impact. Even when I was working on Evolution I didn't have full control of the project, which sometimes stopped me fixing some of the larger problems. And too often I find myself in the minority and don't have the skills to make my case effectively (for example it is pretty hard explaining something when you think it's so obvious it shouldn't need any).

I have poked around AROS a little bit, and Haiku - I dislike a lot of the way GNU/Linux works and I think the only practical solution to that is just a completely different OS. But so far I just haven't been able to get into them for whatever reason, and sometimes you just have to work to your own strengths - and perhaps working on such projects just isn't for me. I think Haiku probably has the most promise at this point, and I think AROS is a little too conservative in it's goals to ever be really useful for most people. I think that despite always screaming for them, projects like these secretly don't really want any new developers anyway - the developers are quite happy to make their little wins on a system completely of their own devising, without having to worry about the politics and simply the hassles of dealing with anyone new. And I think that's an entirely reasonable approach to take too if you're in it for the hobby - it's a hell of a lot more fun that way for starters, and why else would you be doing it?

PS The pile of dirt hasn't moved, but I did have a pretty good nap - although that'll probably just mean a late night!

Yawn

Ahah, well starting work again Monday. Sounds like it could be more interesting than I thought. But I guess time will tell. It might be keeping me pretty busy for the first few months too, but hopefully I still have time for the distractions offered by the beagleboard.

Rode to a meeting and rather than taking the train, which was probably a bit unwise. Apart from not preparing properly and meeting in a cycling jersey ... the weather was unexpectedly hot yesterday (the forecast the day before was about the same but it was no-where near as hot on the road). Every time I put a bit of oomph into it I started to feel a bit weird from over-heating, so had to take it pretty easy most of the way (only about 24km each way). Apart from not having ridden much lately and so not having a decent summer acclimatisation, I think it mostly just comes down to badly needing a haircut and getting a super-hot head from the bouffy hair. Well, having a dreadful hangover and 5 hours of sleep didn't help i'm sure.

The day before I got out and had a look for old computer stuff in pawnbroker shops. Didn't find much, although there were a couple of shops tucked away which had a fun range of old tools and other junk which might merit another visit one day. I did pick up an old keyboard that looked hardly used. I took all the keycaps off and gave them a good wash, and it looks almost new now (apart from putting the - and = on the wrong way around). Oh apart from that **it's just a keyboard** - none of those bullshit `multimedia' keys or even worse - windows keys! It's even got a steel base. It's the little things sometimes ...

On the way home yesterday I ordered some roadbase so I can get stuck into the retaining wall at last (last day of holidays - the one i partly took to do such things - typical).

The dirt.

It got delivered at 8am - and I was all keen to go get a haircut so I can move it without passing out, but somehow the day is slipping away and i'm stuck doing little things around the house again. Almost feels like I still have the hangover from yesterday morning - but I didn't go out last night at all. I think I just need a really good sleep (and maybe some better meals), but seem stuck going to bed too late and rising too early.

The hole.

The trench isn't quite right, and I probably need to dig out the stairs before starting to fill it, but it's almost ready. I should be out there now, but it's a bit warm to be working in the full sun in the middle of the day (when you have a choice anyway). Have to drag myself out a little later to get stuck into it.

Wednesday, 24 February 2010

Smooth as a baby's bum

Well I didn't end up getting out, but I had a break for a while. I also started thinking about what I want to do now I have interrupts working and decided I couldn't be bothered doing too much with the vblank demo. But since there was nothing on TV for a bit I tidied it up and committed it as is.

It's moving smoothly, honest.

Yes, not much of a screenshot, but I thought the page needed some colour and a break from all the text of late. It just smoothly scrolls from one screen to the other and back again, drawing one box on the lower screen every time it's fully hidden. Demo in irq-scroll.c, with the interrupt code from yesterday in exceptions.S.

So what to do next. Well there's still the idea of a little game or demo or so, but i'm lacking a bit of inspiration, and besides, a bit of support to actually run the code in, for things like sound. So i'm starting to think about working on a little real-time micro-kernel.

Many months ago I had been writing one for x86 but got a bit pissed off with all the PC crappyness, and found something more interesting to do instead. So i'll probably start with that, although i'm not sure how much i'll end up using (not that I can remember the state I left it in anyway). For example, currently it does all the VM and process management inside the kernel (it made it simpler at the time), but I want to move that to a process instead so the kernel can work without any pre-emption or locks. Basically the goal is to make the kernel as simple as possible (nano-kernel?) without the simplification getting in the way. What I had been working towards was something like a mix of Minix 3 and AmigaOS; taking features like memory protection and processes from Minix, in addition to the asynchronous non-copying message passing, shared libraries, tasks, and the device mechanism from AmigaOS, with a bit of a reworking to make it all fit. Ahh well, maybe a tad on the optimistic side so I wouldn't hold my breath, on the other hand apart from device drivers there's not all that much to the core of such a design.

Not sure if i'll put it in PuppyBits or another project, but for now I still need to work out some basic routines like context switching and so on so i'll definitely put that stuff in PuppyBits at the least.

Interrupt progress

Had another poke at interrupts late last night. And finally got a simple interrupt handler working. As usual a couple of simple mistakes initially thwarted my efforts.

To start with I was trying to use the FRAMEDONE interrupt from the display controller. But it seems I needed to use VSYNC for HDMI out, and the EVSYNC_ODD/_EVEN for S-Video.
I was using the rfe instruction, but neglected the ! on the register, so it wasn't fixing the stack pointer properly on exit. Somehow the application code managed to run ok for a few seconds with a constantly changing stack!
I forgot to fix lr before saving it on the stack using the srs instruction (subtract 4). Again, rfe was thus skipping an a application instruction every interrupt, and again somehow the code didn't crash immediately either.

Other than using the ARMv6 instructions above, the code is basically straight out of the OMAP TRM § 10.5.3 MPU INTC Preemptive Processing Sequence, the step numbers below relate to that section. I haven't implemented the priority stuff yet (I was hoping it wasn't necessary for such a simple bit of code since I don't need priorities, but it seems it is for other reasons), so it doesn't actually implement re-entrant interrupts, but I might try to get that working before committing it.

        .set    MODE_SUPERVISOR, 0x13

ex_irq:
        // 1. save critical registers
        sub     lr,lr,#4
        srsdb   #MODE_SUPERVISOR!
        cps     #MODE_SUPERVISOR
        push    { r0-r3, r12, lr }

        ldr     r3,=INTCPS_BASE

        // 2,3 save and set priority threshold (not done)

        // 4. find interrupt source
        ldr     r0,[r3,#0x40]

        // 5. allow new interrupts
        mov     r1,#1
        str     r1,[r3,#INTCPS_CONTROL]

        // 6. data sync barrier for reg writes before enable irq
        dsb                              // not sure what options it should use
        
        // 7. enable irq

        // 8. jump to handler
        ldr     r2,=irq_vectors
        and     r0,r0,#0x7f
        ldr     lr,=ex_irq_done
        ldr     pc, [r2, r0, lsl #2]
        
ex_irq_done:
        // 1. disable irq

        // 2. restore threshold level (not done)

        // 3. restore critical registers
        pop     { r0-r3, r12, lr }
        rfeia   sp!

        .data
        .balign 4
        .global irq_vectors
irq_vectors:
        .word   exception_irq, exception_irq, ...
        .word   ... total of 96 vectors

The srs instruction and cps instructions are used to run everything on the supervisor stack/in supervisor mode. On entry the code is executing in irq mode, so it first saves lr_irq and spsr_irq onto the supervisor stack (after fixing the return address in lr!), and then switches to supervisor mode. Without the srs instruction (pre ARMv6) things are pretty messy since you either have to muck about with the irq stack first (and last), or have to switch between modes a few times to get everything sorted (see the links at the end of this post).

I also implement a simple vectored interrupt table to simplify the C side of things, although I think I can just use a simple mov lr,pc before jumping to the vector rather than a literal load.

The ARM ARM actually recommends using the system mode for re-entrant interrupts (you can't use the interrupt mode itself because lr could be clobbered), so why am I using the supervisor stack? Partially historically because at first I couldn't work out how to save the state without clobbering some system stack registers (they're shared with user state). But I also have other plans where this scheme might work better, and if nothing else it stops broken code in user-mode crashing interrupts by breaking the stack pointer.

And finally one more thing I noticed whilst reading bits and pieces is that the AAPCS (EABI) specifies that the stack pointer should remain double-word (8-byte) aligned for entry points. I probably read it before but didn't take notice. This just normally means you always need to push an even number of registers onto the stack before calling other functions. Fortunately this just falls out with this code ... but with interrupt handlers which can be invoked at any time, we don't know what the alignment of the stack is so a specific check is needed too, according to the ARM Info Centre (damn, and I definitely know i've read that before just looking it up now - and it has some other important other bits too!).

Hmm, now i'm thinking about it ... i'm not sure I even need re-entrant interrupts at all. I'm thinking of working towards something along the lines of a microkernel architecture similar to AmigaOS or Minix 3, where device drivers are just high priority unprivileged tasks - the Cortex-A8 should be more than fast enough for this to work. All interrupt handlers will need to do is post events to these tasks, and the software will handle the priorities and whatnot. I suspect re-entrant interrupts are much more important in an embedded system where you just leave most of the work to the interrupt handlers, where DMA isn't available for everything, or the CPU speed is a limiting factor.

Specific Handler

The next step after the interrupt handler is the interrupt vector code itself. This is just a plain function call since the entry point has handled all the nitty gritty. But it still has to deal with the hardware - to identify which interrupt caused it to be invoked, and to clear it. Even with 96 interrupts in the interrupt controller, most of them map to multiple physical events.

In the case of the video subsystem, there is a single interrupt DSS_IRQ (25) which can be triggered from 29 different events in either the DISPC module or the DSS module (actually I just noticed there are many more from the DSI module). § 15.3.2.2 Interrupt Requests has a pretty good overview. Fortunately there is a couple of bits in the DSS_IRQSTATUS which lets the code determine which are asserted to simplify processing. After that test is made, each bit needs to be checked in turn and processed accordingly. And finally the interrupt bits must be reset by writing a 1 to each bit in the DISPC_IRQSTATUS or DSI_IRQSTATUS register - otherwise it will go into an infinite loop re-invoking the interrupt as soon as it exits.

void dispc_handler(int id) {
        uint32_t dssirq = reg32r(DSS_BASE, DSS_IRQSTATUS);

        // see if we have any dispc interrupts
        if (dssirq & DSS_DISPC_IRQ) {
                uint32_t irqstatus = reg32r(DISPC_BASE, DISPC_IRQSTATUS);

                if (irqstatus & DISPC_VSYNC) {
                        ... do vsync code ...
                }

                // clear all interrupt status bits set
                reg32w(DISPC_BASE, DISPC_IRQSTATUS, irqstatus);
        }

        // check for dsi ints (to clear them)
        if (dssirq & DSS_DSI_IRQ) {
                // not expecting this, just clear everything
                reg32w(DSI_BASE, DSI_IRQSTATUS, ~0);
        }
}

This is basically the same process that all interrupt handlers need to go through. Identify the source, handle it, clear the assertion.

There are lots of 'gotchas' with interrupt handler writing at first, but the main thing is to not call any functions which share state with non-interrupt code. e.g. anything non-reentrant, or using hardware registers. Oh, and they should always run as fast as possible - all the `real work' your cpu could be doing is halted the entire time the interrupt is executing, and you could be processing thousands per second in a busy system.

The last piece of the puzzle is the interrupt enable masks. You don't just get all interrupts possible in the system all the time, you can mask (or enable) which ones you want to receive. This is all set-up before interrupts are enabled but after the hardware in question is setup. Here I clear all the status bits as well, just to make sure I don't get an unexpected surprise when I enable CPU interrupts later.

        // disable all but vsync
        reg32w(DISPC_BASE, DISPC_IRQENABLE, DISPC_VSYNC);
        reg32w(DISPC_BASE, DISPC_IRQSTATUS, ~0);
        // dss intterrupt can also receive DSI, so disable those too
        reg32w(DSI_BASE, DSI_IRQENABLE, 0);
        reg32w(DSI_BASE, DSI_IRQSTATUS, ~0);

I think I have some sort of set-up bug because I think that i'm sometimes getting interrupts when no event i'm testing is asserted. I will have to check the extra DSI interrupts I just noticed whilst writing this - they should all be masked off (should be reset condition anyway, but ...).

My little demo code right now just does a vsync'd smooth-scroll by changing the video dma base registers. The TRM states that the register is a `Shadow register, updated on VFP start period or EVSYNC.' There is another little trick though, it looks like all DISPC registers themselves are shadowed again, so you always have to set the GOLCD bit in DISPC_CONTROL whenever you make changes for them to make their way to the hardware. I guess I realised that anyway, but initially forgot.

        // update the graphic layer 0 address (video out) to scroll it
        reg32w(DISPC_BASE, DISPC_GFX_BA0, addr);
        reg32w(DISPC_BASE, DISPC_GFX_BA1, addr);
        reg32s(DISPC_BASE, DISPC_CONTROL, DISPC_GOLCD, ~0);

I might come up with a more impressive demo before committing though. Actually now I have interrupts working it opens up a lot of possibilities, such as a real sound driver, serial driver, and proper timing events (in a very odd twist, sometimes my delay loops seem to run twice as fast as other times ...).

Links

I came across a couple of links on the internet about bare-metal ARM coding, some of it doesn't apply/wont work on OMAP3, but the general ideas are the same.

Simplest bare metal program for ARM;
And a related article on Embedded.com Building Bare-Metal ARM Systems with GNU. or a much easier to read PDF version I just found.

Oh, I finally got out in the yard yesterday - if only for a couple of hours. More or less finished the trench for the main retaining wall foundation. Now I just need to get off my lazy bum and order some road-base and sand. Can't say I felt the fittest - easily out of breath, although I'm sure that has something to do with the sleep apnoea, my particularly poor sleep the night before (i let the cat stay in and he was wandering around all night), as well as my bum sitting. Glorious day today, and no meeting organised yet about work, so I should probably get out on a bike. Maybe I can scan a few pawn shops in the extremely unlikely event any have C64's lying around.

Monday, 22 February 2010

The cult of stupid

I've been thinking about writing about this for some time, and even written a couple of posts, but I was never happy with how they ended up.

Is it just me, or does it seem as though a new religion has started to gain hold, at least amongst the west. And the religion I speak of is a religion of stupidity and ignorance. All religion relies on a certain level of ignorance in order to maintain the integrity of their flock, but this new one is taking the idea to a whole new non-denominational level.

One only has to see any discussions that arise when climate change is mentioned, or recently almost any science-related topic.

The discussion quickly devolves into a slanging match against science in general. All sorts of people pop out of the wood-work in an incessant and boorish tirade of willful ignorance and stupidity. That people can mis-understand science to such a level in an age of universal education and information access is simply astounding. Unless every argument is framed in the purely black and white, good vs evil terms of an undeveloped mind, they cannot grasp it (or at least, this is the impression they wish to give). Science of course does not work this way, even scientific `facts' are not solid. Science only works because of informed scepticism (e.g. don't believe what you're told, without reason), but these fools are not sceptics. They are deniers.

So I wonder, just from where is this stupidity springing forth? Or more importantly, how is it able to gain a hold in such educated societies as Australia and the UK?

I have some ideas, but for now, the following is the article which finally prompted me to publish my thoughts. It is very sickening reading. It is the first part of a five part series discussing this new religion against science, and has already attracted over 500 comments at this time. Whilst the abusive e-mails in the article are alarming, the numerous comments are simply depressing - it seems that Australians really are that stupid.

I await the follow-on parts with interest.

Bullying, lies and the rise of right-wing climate denial.

Well so much for that.

Well that was an odd week. I did a lot of nothing ... instead of getting out and about or digging in the garden I spent most of it reading about the world's woes and getting worked up about it.

Maybe I should've stuck with the coding, but my mind did need a little rest anyway.

I'm pretty much resigned to the fact that I will have to get USB working ... but boy is it a lot of work. The *BSD and Linux implementations are massive - although I don't need anywhere near that sophistication. The Haiku one is about the only other public free implementation I've been able to find (and in-fact the only with a suitable license), and thankfully it is much simpler, although in C++. Most other free operating systems just don't implement USB. Maybe I should just shelve that whole idea and try and get Haiku working instead ... but my last patch hasn't gone anywhere so I lost some interest in that.

I did have a little play with trying to get interrupts working ... but no real progress on that front yet.

Another side-track was that I ended up with an old casio electronic keyboard to play with (for nothing). Given I have so much spare time I thought i'd try and learn a little piano, or at least see if I can drum up enough interest to want to learn it properly. Still not sure yet, my fingers seem to seize up pretty quickly, but it passes the time in the sort of cathartic way that reading the news or programming doesn't.

It gave me other ideas too, like hooking it up to a beagleboard, since I have a spare one still sitting in it's box. There are enough GPIO pins to hook up the matrix scan directly, although the 1.8v level logic adds a twist. Could make a fun little synth, even if I can't play it properly. Alternative is to use a smaller part like a PIC or AVR to decode the keyboard and ship out USB, serial, or even midi. Haven't played with hardware for ages.

Somehow that got me onto another site (through hack-a-day) that had some guys remarkable efforts with old Commodore 64's. I could just use the keyboard and box to put a beagleboard in to make a usable computer and not have to worry so much about USB and the like (and even if I just ran some version of linux on it, it would make a nice box to put everything in, particularly the C64-C or Amiga 600 cases). ~~My brother still has a few old computers at home, so I might try and get one~~ (shit, he threw them all out!), or ask around.

Tuesday, 16 February 2010

All good things must come to an end.

Looks like my extended holiday will soon be over - the work i've had lined up for some time is finally in the last stages of it's paperwork. Damn! Hmmm, working again will be tough, although the work will be more interesting than last time (or, at least it bloody well better be!).

Might take a bit of a break from (or at least, a significant reduction of) the beagle board hacking whilst I catch up on house work, yard work, sleep, exercise, get a hair cut, and/or just to go to the beach in the few days I have left in this hot weather. Damn i'm so far behind on sleep. I never catch up on that. Sleep sleep sleep. Sigh, repeating it doesn't help either. Sleep sleep. Worth a try.

I felt i'd hit a bit of a wall anyway, so it's probably a good time to take a break, and making progress on sound was a nice milestone.

Monday, 15 February 2010

Mary had a ...

Had a go out sound in - and couldn't get it to work at first, so I went for a nice ride. Just a casual 58km with a long beer and pizza stop at a mate's on the way. I was a little sore and tired by the time I got home (mostly from the saddle; haven't ridden that bike much for ages), but after a decent sleep, not a hint of soreness.

After I got home and became bored with TV I decided to have a search on the net about sound again - and the vital clue - aux-in is line-level and not mic-level (well I should've known that but it all goes through the same amps). So the code was working after-all, I plugged a laptop out into line-in and got some results.

Actually I worked out that if I use a speaker (or headphones) as the line-in, I can get enough signal as well, but I left the gain at 0dB for a line-level signal.

Then I wrote a very simple synthesiser loop which plays 'mary had a little lamb' (badly) with a triangle-wave and simple ADSR envelope (in true SID-chip style!), and then jumps to a rather annoying 440Hz sound which phases from side to side, whilst showing the line-in waveform on the screen (as in the screenshot).

Source in audo-beep.c, which includes all of the initialisation code as well.

There's a commented line in the init:

aregw(AV_ATX2ARXPGA, (7 << 3) | 7);

which can send the digital signal from line-in directly to the output. But the signal is really rather awful, and doesn't seem to match the data received, with lots of clipping and nastiness. Probably some bad interaction with the synethsiser data, and not biasing things properly.

This code doesn't use DMA, so is about as simple as you can get whilst still making noises.

Hmm, I wonder what to do next. Having USB for keyboard and mouse is a real pain, it's pretty hard to do much interesting `computer' stuff when you only have a serial port for communications.

Now to get 'mary had a little lamb' out of my head ...

Sunday, 14 February 2010

Beagle beeps

Well, some semblance of a beep, it sounds more like an alarm going off. Another frustrating tasks of mis-understanding and reboots.

To start with I had it 'click' when it turned the audio codec on or off, so at least I knew I was writing to the right registers. And changing the volume even changed the magnitude of the click.

Then after setting all the registers to something that seemed to make sense, I simply forgot to turn on the audio interface, so I spent a lot of time wondering why the McBSP end of things wasn't sending any data out (I was copying the `MP3 Out' use-case too closely which neglected to mention it ... although I read about the enable bit numerous times and knew I needed it, I simply forgot to check I was setting it). Then once I got that working I still had no sound - I couldn't get the routing to work. I knew the serial port frame signaling was at least working (no idea on the data) as changing the sample rate had an effect on how often XDRDY was asserted.

But nothing I tried worked. The audio device has quite a few registers spread all over the place, and the manual is formatted in a way which makes them hard to look up or follow. It only has a TOC entry for the whole lot, and with Evince, that makes it pretty tricky to navigate. Not to mention that Evince wants to select links half the time when you click on them just to make things more painful. I think I missed something in the manual about which serial sources receive data from the port, and data only comes in via serial 2 for non-TDM signaling. And I was trying to use serial input 1 as the source, then switching that to audio 1 internally - so I guess there was no sound from the start.

In the end I cheated a bit - I got a dump of the registers on a running GNU/Linux system which was playing sound at the time. Then I compared them one by one with those I had - on the way found some bugs anyway - and eventually just set the routing up the same way.

After all the hassles and straining to hear anything from a set of headphones I was surprised when it worked - fortunately I had the headphones off otherwise I might've got a rude shock at 3 in the morning. I'm still not using DMA to write to it, but that shouldn't be too difficult. I want to try to receive sound first before I try DMA or drop the code, but hopefully that should be simpler, since I don't have much choice in what goes where now.

Oh there was one other sting in the tail. I was playing with one of the gain controls to see if I could turn the volume down a bit: AV_ARXL2PGA. It's supposed to have a range from 0x00 to 0x3f (i.e. from mute to 0dB in 1dB steps; the master volume control), but if I set any value below about 0x30 it simply mutes the output. Which was another value I 'got wrong' when testing, so who knows, I might've had the routing correct at one point and not heard anything because of this. I'll have to have another poke at that too.

So mistakes in summary:

Forget to turn on the AUDIO_IF bit to receive data.
Didn't make sure the master volume (ARXL2PGA, ARXR2PGA) is over 0x30 (why this is an issue I don't know).
For I2S mode, it seems audio only comes in through SDRL2 and SDRR2.

For the McBSP2 there was some issues too, to do with signaling polarities mostly - although I don't know if I have them right yet. Requires looking at the raw signal diagrams more closely.

Hmm, looks like a nice day for a ride.

Saturday, 13 February 2010

Searching for myself

No i'm not about to go all deep and meaningful on you, although I was probably drunk enough earlier to consider it. Neither have I lost track of my person. Nor was I vanity-surfing.

First I was looking at how easy this blog was to find when looking for beagle board coding info, and then I was looking to see how easy info about me was to find for those that might be looking.

For the first I found a `BeagleBoard' user on facebook who has been noting my posts - but for that I had to dig, as I did for this site. And for the second I found a real gem of a memory on the first page of Google search results:

PC Backups/bugs etc.
Michael Zucchi (cismpz@cis.unisa.edu.au)
Tue, 10 Sep 1996 13:00:20 +0930 (CST)

    * Messages sorted by: [ date ][ thread ][ subject ][ author ]
    * Next message: Ken Laprade: "BUG: "cannot create" error file"
    * Previous message: Martin Espinoza: "I need someone's, ANYONE's ...
    * Next in thread: Gregory T. Notch: "Re: PC Backups/bugs etc." 

Howdy,

I'm an assistant sysadmin in a small computer department at the Uni
of South Australia. We have automated backup systems for unix
(using amanda), and mac systems (using retrospect), however its
currently upto the users of pc systems to backup the files they
use regularly, with whatever means they have at thier disposal.

Now, the recent versions of 'samba' include an smbclient which can
create tar files from pc shares. Obviously the idea here is to
use the recent tar support of Amanda to automagially do our PC
backups too, based on WFWG shares on each of the systems to be
backed up. Essentially, i'm trying to 'bolt on' remote pc share
'tar' support into the gnu-tar files. e.g. if the disklist
contains:

hostname //pchost/password nocomp-user-tar

then the 'hostname' host has samba installed (in my case i use
the master backup host - load to this host could be a problem
when a large number of remote pc disks are being backed up through it),
and when a backup takes place it connects to \\pchost\backup with
the password 'password', and tars to stdout. The gnutar code
recognises a remote pc host by the disk name syntax. I also modified
the smbclient software slightly to produce a 'totals' line from
stderr. I'm not sure if i'll ever get it finished, but i've managed
to get it to tar a remote NT workstation share and record it in
/etc/amandates, and incremental backups should work (properly - using
the dos archive bit) too.

However i seem to have found a bug in Amanda in the process.

client-src/sendsize.c calls start_amandates(0) in line 89.
start_amandates runs ok, but calls enter_record (from
common-src/amandates.c) which, on error (when the recorded
dump date doesn't match the /etc/amandates dumpdate) it
tries to use the 'log' function. This in turn tries to access
the config files - which aren't stored on the client side of
an amanda setup. Consequently sendsize bombs out with:
"sendsize: could not open log file (null): Bad address"

As this seems the only place to report bugs, i'm leaving it
here.

Regards,

Michael Zucchi

-- 
     ///   `... thinking is an exercise to which all too few brains
    ///     are accustomed.' - First Lensman, E.E. `Doc' Smith
\\\///  Michael Zucchi B.E.                      M.Zucchi@UniSA.edu.au
 \\\/   CIS, Assistant Systems Administrator, UniSA     +61 8 302 3033

    * Next message: Ken Laprade: "BUG: "cannot create" error file"
    * Previous message: Martin Espinoza: "I need someone's, ANYONE's ...
    * Next in thread: Gregory T. Notch: "Re: PC Backups/bugs etc."

Wow. From my first job. I had written and released free-ware before for AmigaOS in isolation (an accelerated GIF decoding system component for example), but IIRC this was pretty much my first post to a free software mailing list, and it was followed not long after by my first patch.

I was so polite back then, I wonder what happened. And wrote overly verbose email messages as I continue to do to this day. Although I had a somewhat pretentious signature (which I kept for many years), I still think it speaks volumes.

But after 15 years of crapping around the internet, you might have expected something like that would've dropped from the first page by now ...

What I find interesting is that if you search in 'bing', that is about the only sort of thing you CAN easily find with my name on it. It's like the whole internet after 1998 simply doesn't exist as far as I'm concerned (oddly enough, that's about when I started on GNOME).

Hmm, maybe that wouldn't be such a bad thing either.

Friday, 12 February 2010

A couple of bits and pieces

Wow this really turned into a monster post, I guess I had a lot of crap on my mind. So much for going for a ride today as i'd hoped - the wind really picked up anyway so it wouldn't have been much fun.

On Software Engineering ...

First, a couple of interesting posts on a blog run by a guy from Insomniac (Ratchet & Clank, Resistance, etc). I've read bits of his before because he has some posts about CELL coding, but I came across his revamped site recently whilst looking up issues about C99's un-ANSI-C-ed-ness.

Three Big Lies of Software Development. I think they are pretty fundamental things that every programmer needs to keep in mind. The first is the scourge of pretty much all modern software - and why software seems to stay about the same speed even though hardware has jumped in leaps and bounds. The second I suggest could be squarely aimed at every introductory course to `object oriented design' and is exacerbated by the first lie. And the third is probably the most important; it's the data, stupid.

Sketches on on concurrency, data design, and performance is also a must read in this day and age of multi-processors. The post-it notes are a particularly nice touch although the PDF versions are much nicer to read than the web gallery. I particularly like his rant against typical C++, and how he explains that ideas such as a 'lock free doubly-linked list' are nonsensical in a concurrent environment. e.g. with no 'after' or 'before', how can you insert 'after' or 'before', and without an insert 'after' or 'before', it is no longer the same ADT.

And finally the aforementioned reference to C99's weirdness in understanding strict aliasing. I see why they added this feature - to enable optimisations that a compiler couldn't otherwise do (but looked obvious to an author), but to me it just isn't `C' any more, it's a slightly different language (and worse, it conflicts explicitly with the sort of stuff you need to do to make normal C faster or do operating-system type tasks). Anyway, I think this particular comment really tells the story, with my own emphasis:

The above source when compiled with GCC 3.4.1 or GCC 4.0 with the -Wstrict-aliasing=2 flag enabled will NOT generate a warning. This should serve as an example to always check the generated code. Warnings are often helpful hints, but they are by no means exaustive and do not always detect when a programmer makes an error. Like any peice of software, a compiler has limits. Knowing them can only be helpful.

I'm not sure that even I would be terrible comfortable with a language where that was a requirement - and I suspect i'd be far more comfortable with it than the majority. I was looking this stuff up because Jeff was chatting to me about some of my code in Evolution that a recent gcc was just silently dropping because it aliased some pointers. Funny, I thought that's what casts were for - and it's not like you can't still do it, you can just use a union to do exactly same thing; it's just a lot messier for humans to read. Anyway, that post is a really good explanation of the issue and how to change your code to fit the newer C variant so the compiler can make it run (potentially quite a bit) faster.

Anyway, a very interesting few posts on Mr Acton's site - it's a pity he doesn't update it a little more often (there's more on Insomniacs R&D site too, much more I still need to read myself).

On Menus ...

Odd information I feel the need to share: IceCat (Firefox) gets a little unwieldy with 180 tabs open, I think this is the fourth copy of A Hacker's Craic open because I can't be bothered to find the other ones. It still runs though.

Which leads me on to partly why ... scrolling @#@#ing menus. What idiot decided they belonged in any modern toolkit (let me guess, they're in Cocoa)? I remember seeing some GUI toolkit implement them in the early 90's and thought `that looked pretty cool' - but they aren't. They suck. They don't scale, and they're difficult to use. Even a small list on a button at the top or bottom of the screen will put on tiny hard-to-hit and hard-to-use up/down arrows if you've previously selected an item at the wrong part of the list. Try using one with a G-spot under your finger.

This my friend, is a G-spot.

So here's a perfect example in Evince, which I have been using heavily recently (more on that later). It looks pretty and all, nice and simple and obvious and `easy to use'.

Now look what happens when you access it from Evince running in full-screen mode. And this is only one of the possibilities, you get various, quite-different results depending on exactly where within the button you click and if the mouse moves whilst you press on it with your finger e.g. if you click near the top it actually selects `Best Fit' for you and removes the scrolling buttons since the mouse is suddenly over the 'up scroll region' and not over the current item.

Hmm, nice one. It looks like arse, is completely unnecessary, and those silly little `scroll region/buttons' are too hard to use; apart from being too small, they're actually a whole new type of button which aren't used anywhere else in any GUI - a `click-less button', or `hover button' if you will. I'm using a fairly old Evince, so these might be bugs specific to the version, but the whole idea stinks and it's going to always have these sorts of serious usability issues - a menu that can look different every time you use it isn't going to aid muscle memory for starters. Not to mention the stupid `hover buttons' that you need to put your mouse over and wait ... wait ... wait for it to show the desired item.

The problem is it's mixing two access modes, a 'click' and a 'hover mode' - once you click on the button you're in 'hover mode' - whatever the mouse is over is selected or activated. So it has to attempt to ensure the mouse is over the right location when you click, and since it wont warp the mouse pointer, it warps the menu instead. Even mouse pointer warping would be (much) better than this, but there are alternatives like staying in a 'click' mode and just requiring another click to actually select an option. It sort of does this if you don't move when you 'click', but I don't know many mice that don't move most of the time when you physically push part of them, so it's even worse - you could get either behaviour depending on how steady your hand is. In the Firefox case with 180 items it would just never work anyway, it's just the wrong UI element to use.

As for Evince, I think i'm stuck using the wrong tool for the job. It's not really much of a document reader once you get a document with over few dozen pages. The search is very slow too, on my workstation it takes about 70 seconds (depending greatly on the search string) to scan the OMAP3 TRM (~3500 pages); although I realise this has a lot to do with the PDF format. fgrep scans the whole 22MB raw file in 0.035s. Add no find all, and the fact it forgets the last search as soon as you click somewhere, and it's pretty painful. Not to mention the lack of navigation stack and other basics which get in the way. Ok perhaps it is intended to be a simple viewer for 'users' - but developers are friggan `users' too.

Now, there was something else I wanted to mention ... what was it. Damn, completely gone. Oh oh, no, here it comes. It's about `distributions'.

On Distributions ...

On the Haiku lists there's been some discussions about package management. And that naturally leads to the idea of a `distribution' - which the Haiku guys are fundamentally against, and I think with good reason.

What exactly is a `distribution'? It's a collection of software which has been thrown together and presumably validated to some extent in such a way as to work together. In a way it is a kind of `neat' thing, and one that can only exist because all the software is free software; every single piece of third software can (potentially) be compiled and validated specifically for the target platform. This is a pretty fundamental change to the way software is distributed, and can lead to fundamentally different outcomes, for example with security, or platform support. Even a lowly IRC client is under complete vendor control, so can be prevented from becoming a penetration attack vector because of some sloppy code or accidents. Or a vendor can rebuild everything for a different hardware platform without having to wait for every ISV to support it. And upgrading can be a bit fun if your computer is a playground - discovering all the new features of every application (until you're a jaded old prick like me, where half the changes are frustratingly stupid).

So, there are many benefits from `distribution' based deployment.

However there are problems as well. We all used to scowl at how M$ forces everyone onto the upgrade treadmill, but although different in nature, is quite a problem with `distribution' based software too. In some ways it is worse - you actually have to upgrade all of your software at once; and this is not always a good thing. New versions means new bugs, and with the propensity for the most active developers being rewarded with having their projects anointed for inclusion, often stable mature applications are replaced with unstable shinier newer ones. Some previously existing software may not make the cut at all, so you can't even keep using it if you wanted to. And although it's more of a serious problem for proprietary software (and thus not particularly important), the upgrade mill affects free software too - it can be extremely tough going compiling older software to a usable state - and probably drag in a whole pile of tools and libraries not used elsewhere. In effect old software bit-rots faster than it needs to, and is effectively locked out and thrown away. This is particularly acute with the modern scourge of using massive and complex frameworks which change in incompatible ways from version to version in time-frames of 5 years or less.

The strength of a Unix-like `open platform' is also it's major weakness. For example, by having the option to use different toolkits in X, every dog and his man will choose a different toolkit for their apps. And this just multiplies all of the potential issues that large-scale code faces; project goals, consistent design philosophy, bugs and maintenance. Not only do you have these problems within a given software `platform', it is multiplied across all of them.

Just look at the completely fucked-up-mess that sound is. Just imagine the state we'd be in if the kernel didn't implement TCP/IP but we had 4 or 5 different, incompatible, and competing implementations running in user-land. Sound is just as fundamentally a hardware issue as networking is, so why isn't it all just in the kernel the same way? And i'm talking about the full audio stack here, not just a buffer to a DAC which is all Linux will allow, throwing the guts of modern sound off to user-land. After all, we don't have user-land switching of the ethernet packets do we?

So I think what the Haiku devs are really trying to avoid is such a mess infiltrating their system. Without this crap to deal with there just isn't any need for a `distribution' in the first place. And even the fact of having a single GUI toolkit with tighter focus will take a lot less effort to maintain than a dozen (regardless of the matter of spreading the load), and more importantly provide the user with a more consistent and pleasant experience. And developers too for that matter; having a choice just means the wrong one can be taken.

Hmm, I like the sound of that, and can't find any reference in a web search, so with a bit of outright ego-tism, I shall claim it to be my own.

Zucchi's Law

“Having a choice always leads to
the wrong option being taken
at least some of the time.”

Somewhat analagous to Murphy's Law I guess, but the key difference is this is about the limitations of informed choice and the fallibility of humans, not of their environment.

There's nothing technical stopping GNU/Linux doing the same thing, but there's too much `wet ware' in the way, so the politics make it completely impossible. It will take another operating system to do it. That's all Android or ChromeOS are; a single toolkit and application environment with less choice for the developer, but still built on the same base operating system.

Thursday, 11 February 2010

Eye-Too-See

Slightly wet day today ... what better excuse is needed for some hacking. Like I needed one.

I started looking at the I²C controller today. Originally the idea was to try and get some sort of sound out of the board (it's one of the parts needed), but then I thought that was a bit too ambitious. So I settled for trying to blink the LED connected to the power controller/audio chip.

After a bit of a false start it just worked. Damn, there's a surprise, not used to things `just working'. The manual describes programming the I²C controller a few times, and only the one with the detailed flow-chart of a sort of state-machine is the one that works. Took a little while to work out how to read back data from a master, but it was reasonably straight forward given the description of the protocol in the TPS65950 manual.

Since the LED is connected through a PWM the code does more than just turn the LED on and off, I made it `pulse' by smoothly changing the duty cycle using an approximate sine curve. Code is in i2c-leds.

Wednesday, 10 February 2010

Vectors and Bits again

Well I fixed the `c long' version of the rect-fill from the update mentioned a couple of posts ago ... and a bit more besides.

After sleeping in a bit I worked on some MMU code so I can start using the CPU cache. Most of that was just gaining a deeper understanding of the permission and memory type bits, which are a little confusing in places. It looks like it's been extended a couple of times whilst keeping compatability so there's multiple combinations that appear to do the same thing but with different nomenclature. Hmm, I have it more or less worked out ... I think. So once I got the MMU code working, it allowed me to enable caches and play a bit with various options. I used only section and super-section pages - 1MB or 16MB, so i'm probably only using a couple of TLB entries to run everything (= no page table walks).

I was assuming the caches were on when i enabled the MMU ... oh but they weren't, of course ... stupid me. Wow does that make a difference ... Wow.

Ok, pause to run a few more timings. ... Here goes.

Code                   Total    Slowest Fastest
C short             36097442    0.89    5.22
C long              40526536    1.00    5.86
ARM asm             15801430    0.38    2.28
NEON                 9654736    0.23    1.39
NEON2                9982542    0.24    1.44
NEON3                9421366    0.23    1.36
NEON4                9467262    0.23    1.37
sDMA                 6904794    0.17    1.00

(see 2 posts ago, or render-rect.c for what they mean)

This is the original scenario from a previous post, but with a 'fixed' C long version. Strangely, it runs slower than the short version. A cursory look at the assembly looks like it's doing the right thing - but it's not worth looking deeper. My guess is the extra logic required for the un-aligned edges is throwing it out or the pointer aliasing is making the compiler angry. Oddly, the performance monitor is registering the same number of data writes too.

Anyway, who cares. Lets turn the MMU on and set the memory regions up properly and and see what happens. Even with the caches off things happen, although not much.

With MMU on, graphics = wt                    With MMU on, graphics = wb

Code           Total    Slowest Fastest       Code           Total    Slowest Fastest
C short     36058684    0.89    7.33       C short     36233408    0.89    5.23
C long      40496404    1.00    8.23       C long      40584664    1.00    5.86
ARM asm      9367578    0.23    1.90       ARM asm     15811204    0.38    2.28
NEON         5332580    0.13    1.08       NEON         9653676    0.23    1.39
NEON2        4917308    0.12    1.00       NEON2       10057086    0.24    1.45
NEON3        5598968    0.13    1.13       NEON3        9555816    0.23    1.38
NEON4        5685246    0.14    1.15       NEON4        9431842    0.23    1.36
sDMA         6908602    0.17    1.40       sDMA         6917612    0.17    1.00

We're starting to beat the system DMA - I presume that even with the cache off this enables some sort of write-combining/write-buffering. It's interesting that the NEON2 code speeds up the most (nearly 2x) - probably given it has the smallest loop the CPU isn't in contention for memory bandwidth as much. You'd never use a write-back cache for video memory, but I timed it anyway. I really have no idea how or why using it is making any difference whatsoever though, since the global cache bits are all off!

Ok, so ... der, lets turn on the caches properly.

The way I set the MMU up is to have the first bank of memory - where all code and data resides - as write-back write-allocate (writes also read a cache-line), and the second - where the frame-buffer resides - as write-through no-write-allocate. For the `graphics = wb' case, I also set write-back write-allocate on the second bank of memory (in a separate run). All the IO devices are using shared-device mode.

First, with unrolled loops.

MMU on, graphics = wt, -O3 -funroll-loops     MMU on, graphics = wb, -O3 -funroll-loops
             -- lots of artifacts
Code           Total    Slowest Fastest       Code           Total    Slowest Fastest
C short       957743    0.14    1.02       C short      1816546    0.28    1.00
C long        956818    0.14    1.02       C long       1992627    0.30    1.09
ARM asm       933198    0.14    1.00       ARM asm      1871829    0.28    1.03
NEON          930448    0.14    1.00       NEON         1857085    0.28    1.02
NEON2         945969    0.14    1.01       NEON2        1862711    0.28    1.02
NEON3         946522    0.14    1.01       NEON3        1848473    0.28    1.01
NEON4         945739    0.14    1.01       NEON4        1861538    0.28    1.02
sDMA         6456313    1.00    6.93       sDMA         6455228    1.00    3.55

Ahh, now this is more like it. Getting over 800MB/S (if my timing calculations are right).

Even the basic crappy C code is within a whisker of everything else - even though it executes about 3.5x as many instructions to get the same work done. The system DMA has fallen right off; but run asynchronously it would probably still be worth using since it is basically `free', and the CPU can do a lot more than just write memory. This code also polls the DMA status in a tight loop, I don't know if that is having any bandwidth effects

The write-back timing is all out of whack - the C short version is the first to run, so it gets a benefit of having an empty cache and nothing to write-back. You also get to see the CPU write stuff back to the screen when it feels the need - lots of weird visual artifacts. And the explicit cache flushing required would only make it slower on top of that. In short - useless for a framebuffer. Any performance issues you might expect a write-back cache to address are handled much better by using proper algorithms. I saw it mentioned on the beagleboard list, so it seemed worthy of comment ...

And lastly, just with -O3, a typical compile flag (-funroll-loops generates much bigger code so might not always be desirable). I also added in a `hyper-optimised' memset implementation for good measure.

MMU on, graphics = wt, -O3

Code                   Total    Slowest Fastest
C short              1372096    0.21    1.47
C long               1038868    0.16    1.11
ARM asm               948600    0.14    1.02
NEON                  929968    0.14    1.00
NEON2                 939165    0.14    1.00
NEON3                 946102    0.14    1.01
NEON4                 945702    0.14    1.01
msNEON               1309313    0.20    1.40   (see memset_armneon())
sDMA                 6462071    1.00    6.94

The C is still ok, if a bit slower, but barely worth `optimising' in this trivial case.

The msNEON code is from the link indicated ... interesting that a more complex C loop beats it somewhat; the msNEON code is only writing the same amount of memory linearly not as a rectangle, and with severe alignment restrictions.

The NEON2 code has such a simple inner loop, yet is the most consistently top performer. Good to see that KISS sometimes still works.

 // write out 32-byte chunks
2: subs r6,#1
 vst1.64 { d0, d1, d2, d3 }, [r5, :64]! // ARM syntax is `r5 @ 64'
 bgt 2b

The ARM code is quite a mess by comparison:

 // write out 32-byte chunks
2: strd r2,[r5]
 strd r2,[r5, #8]
 strd r2,[r5, #16]
 subs r6,#1
 strd r2,[r5, #24]
 add r5,r5,#32
 bgt 2b

(FWIW I tried a similar trivial loop in ARM, a direct translation of the `C long' code, and that wasn't terribly fast).

Anyway, I think i've done memory fill/rect fill to bloody death (and beyond!) now. It's just not a terribly interesting problem - particularly for a SIMD unit. Apart from evaluating raw memory performance. Actually it is kind of handy for that since it will easily show if things aren't configured properly.

PS Code changes not committed yet.

Time is an illusion ...

Damn, it's 4am again. Knew I should've gone out for a ride yesterday, just haven't felt really sleepy - but it's starting to bite now.

I was watching TV (well, I had it on, it was a pretty boring - and extremely long - silent movie from Taiwan) and catching up on the news ... and then I got bored with that ... and poked around the 'my.ti.com' for a little while and came across some beagleboard TV out stuff. And being a glutton for punishment, something to look at at midnight ...

When playing with the Haiku boot process I had installed an older u-boot which initialises the video, so I guessed that should at least be a good signal. So I dragged it all around to the TV again and plugged it in and booted it up. Blah, still crap. What's going on. So as a last resort I tried another cable - i'd been using one of those expensive ones and just didn't expect any problems. Found a brand new cheapie from a video card or something ... damn, worked!

Well after much mucking about and a few mistakes I added some API to add TV out, and handle viewports on larger data (to clone the lcd display), and well, enough crapping on:

Don't mind the grey screen on the venerable old 1084 ... I don't have the right cable to hook up separated-lca to s-video, so there's no colour signal.

Hmm, now the cat's whining, wonder what he wants.

Monday, 8 February 2010

vectors and bits

Updated, see the end of the post

Yesterday I started poking around with the SIMD unit. Wow, is that a way to eat up time or what.

Wasn't quite sure what to do with it, so played at first with writing an RGB888 to RGB565 converter. Didn't get to testing it, but it brought back memories of the SPU hacking I did before - the instruction set has a lot of similarities, although NEON is filled out more. And like with the SPU, there's so many ways to do the same thing it can be a bit overwhelming trying to find a good way of solving a problem. Particularly if you don't really know which instructions are there, or what they do. There seems to be some interesting ones though, like vrsi which lets you insert the upper-bits of each element into the lower-bits of each element in another register (without clobbering it's contents). I still seem to be wedded to the vtbl instruction as I was with the shuffleb instruction on SPU, although I think it's not always the best route. I really missed the spu_timing tool though - although the issue rules and latencies are simpler.

That idea didn't seem to be going anywhere in particular, so I thought i'd look at some specific stuff I need, and for which I have very slow implementations - font rendering and rect fill, although I only got around to looking at rect fill, and that still doesn't work 100%. I just did it using ARM code though. For such an old architecture i'm was a little surprised at the lack of info available for such tasks - at least as it applies to searching using google. Maybe it's too old, and the new stuff is hidden away in proprietary and embedded systems, and nobody does software rendering anymore.

And then I totally lost track of the time reading about the DSP ... at 4am I thought it was time to `call it a night' - that's what I get for having coffee and chips for dinner (and in short; there's no free tools to use it, and the Linux driver uses binary blobs - of course).

Today I filled out the rect fill code a bit and tried various implementations, including some NEON variants. Oh, I also `discovered' the performance counting unit - wow, you can track a lot of stuff, from branches taken to cache and memory stats to stalls. Very nice.

Oh NEON. Fucking hell. Spent about 4 hours tracking down why the NEON instructions just threw an undefined instruction exception. After a couple of hours of digging I came across a reference to the Coprocessor Access Control Register, but that didn't really help (oh and a thread on the beagleboard group where people just say to turn CONFIG_NEON to y ... sigh). So here I was trying to turn on clocks and power and other PRCM registers ... and then I remembered something about a bit in a status register to enable/disable the whole shebang. A bit more tracking down (i've got about nearly 10K pages of documentation to search now) and I discovered the FPEXC register and VMSR/VMRS instructions (my memory was wrong, but it was a lucky guess). Although the binutils i'm using doesn't support them ... sigh. Finally found a workaround using MRC/MCR from Linux - about the only thing i've managed to find in there when tracking things down (a lot of stuff is so abstracted it it's very hard to follow). Gee that was frustrating.

Anyway, so I came up with some total cycle counts for various implementations of a 'rectangular block colour fill for RGB565'.

These are all with *NO CACHE* or write buffers, so they don't really mean anything other than relative to each other. You have to turn the MMU on to turn on data caches and write buffers, perhaps that is the next thing to try.

Code                   Total    Slowest Fastest
C short             36308222    1.00    5.25
C long              18307488    0.50    2.64
ARM asm             15877960    0.43    2.29 - uses 4x strd (writes 8 bytes/instruction)
NEON                 9735680    0.26    1.40 - uses 2x writes of 2xD regs, 64 bit aligned
NEON2                9134690    0.25    1.32 - uses 1x write of 4xD regs, 64 bit aligned
NEON3                9311284    0.25    1.34 - uses 2x writes of 4xD regs, 128 bit aligned
NEON4                9191652    0.25    1.33 - uses vstm of 8xD regs
sDMA                 6910682    0.19    1.00

The NEON implementations use ARM code for the non-aligned 'edges', and none of them are particularly fantastic code.

Hrm, I thought the ARM asm one was ok when I was running it by itself, i guess twice as fast as something is quite noticeable, but obviously it's kind of slow.

Looking in more detail at a couple of them:

drawRect() C long
 total cycles=18307488
 dwrite intns=169668
 ext writes  =169671
 iexec       =701230
 istall      =1201453

drawRect() ARM asm
 total cycles=15877960
 dwrite intns=168963
 ext writes  =168965
 iexec       =310508
 istall      =182922

The C version executes 2x as many instructions but the execution time isn't much different - everything is waiting on memory (although I wonder if it uses less power). At first I thought the total cycle count was a mistake, but of course, it's taking about 100x longer than the number of instructions executing, so memory accesses must be around 100 cycles -- which sounds about right. Be interesting to see if any cache/write buffers make a noticeable difference here, although it is just a flood of writes.

Update: Should've tested more, the long version was still just a 'short' version, it just wrote half the width ... so all bogus. Will revisit in a newer post.

The code in question is all in puppy bits:

Sunday, 7 February 2010

Puppy Bits is born

Although it's still a bit broken, I figured the code was finally good enough to upload, so I've created a google-code project called Puppy Bits.

No 'demo' yet, but most of my library-so-far.

I had some 'lunch' too; damn, 6pm, another day just vanished.

Video Graphics

Well, I've still given up on the TV out and the video encoder, but I did have a bit of success with the rest of the video system. Seems that writing the video code using the register names did pay off after-all.

So instead of enjoying another fantastic, if a little warm, day outside, i've been hacking away (seriously, it must some sort of addiction) at some sort of video/graphics interface. And all i've had to eat today so far is beer ...

I added code to set various video modes - all the basic ones up to 1280x1024. Since i'm using a fixed clock with an integer divider, most of the pixel clocks are wrong, but they work with my monitor as they are 'close enough'. I also separated out the graphics part from the video part, so I can use the hardware more fully, as below.

Anyway, obligatory screen-shot, then some explanation.

First, the video mode is set to 1280x1024, with a light-blue background colour. That's all it will display, until I add a graphical channel.

The dark-blue graphical channel is using channel 0 - the 'graphics' channel, in RGB16 format at 1024x768 resolution, centered on the main video window.

Then the noisy rectangle is using channel 2, again in RGB16 format, although it could also be in UYVU or YUV2 format. i.e. it is a 'video overlay'.

I'll upload it somewhere soon - maybe this week.

Debugging

To help with debugging i've come up with a couple ideas too. First, when I get a fatal exception I now jump to a little 'crash monitor' that lets me examine memory. Well that's all I do now, but it can always be extended. But even that has proven quite handy, e.g. to examine (more of) the stack.

Exception: Data Abort
 pc: 0x80009f00 sr: 0x200001d3
 r0: 0x00000020
 r1: 0x00000040
 r2: 0x48050400
 r3: 0x00000040
 r4: 0x48050400
 r5: 0x480504a0
 r6: 0x80e3fd84
 r7: 0x00000002
 r8: 0x48050400
 r9: 0x00000054
r10: 0x00000002
r11: 0x80e3fdbc
r12: 0x00000066
r13: 0x80e3fd70 0x00000008 0x00000000 0x80e3fe14 0x80009ff4 0x80e3fe14 0x80009b08 0x72747300 0x00797063
r14: 0x80e3fd7c
r15: 0x80009f00
Entering crappy crash monitor.
 ? for help.
#> ?
?               help
m addr len      Show memory as words from addr for len words
#> m 0x80e3fd70 22

0x80e3fd70: 0x00000008 0x00000000 0x80e3fe14 0x80009ff4 0x80e3fe14 0x80009b08 0x72747300 0x00797063
0x80e3fd90: 0x74757064 0x6d6d0063 0x6c665f75 0x5f687375 0x00424c54 0x646e6573 0x7274735f 0x00676e69
0x80e3fdb0: 0x00000001 0x80200000 0x80200000 0x00000280 0x00000200 0x00000000
#>

The second is a sort of crash analyser, that turns addresses into functions. Basically, it takes the output of `objdump -d', and a list of addresses, and then turns them into the function the address resides in, and optionally the assembly language of the function. I've just been using `objdump | less' to do the same thing manually for individual addresses, but once you get more than a couple it gets tedious.

#> m 0x80e3fa00 64
0x80e3fa00: 0x80e3fab8 0x00000000 0x00000000 0x8000c394 0x00000010 0x80022940 0x00000000 0x00000000
0x80e3fa20: 0x00000000 0x8000b89c 0x80e3fab8 0x00000200 0x00000000 0x8000ba68 0x80e3fab8 0x00000200
0x80e3fa40: 0xffffffff 0x00000000 0x00000000 0x8000d0ec 0x80e3fab8 0x00000200 0x00000000 0x80022d68
0x80e3fa60: 0x00000000 0x8000b89c 0x80e3fab8 0x00000200 0x00000000 0x8000ba68 0x80e3fab8 0x00000200
0x80e3fa80: 0x00000010 0x00000000 0x00000000 0x80013294 0x80e3fab8 0x00000200 0x00000003 0x00000000
0x80e3faa0: 0x80e3fcd8 0x81204148 0x00000001 0x80013620 0x80e3fab8 0x00000001 0x00000013 0x000001e1
0x80e3fac0: 0x00000002 0x00000200 0xffffffff 0x00000001 0x00e3fb02 0x80022752 0x8004730b 0x80e3fb00
0x80e3fae0: 0x00000032 0x80008a84 0x80e3fb00 0x80046258 0x00000000 0x00000001 0x812040a0 0x8000851c

Dump that to a file on my workstation, then process it:

$ cat > a
0x80e3fa00: 0x80e3fab8 0x00000000 0x00000000 0x8000c394 0x00000010 0x80022940 0x00000000 0x00000000
 ...
$  cat a | while read line ; do ./crashdump -3 haiku-dump.text $line ; done
8000c31c <_ZN10MemoryDisk6ReadAtEPvxS0_m>:
                ...
8000c388:       e0811004        add     r1, r1, r4
8000c38c:       e1a02006        mov     r2, r6
8000c390:       ebfffba6        bl      8000b230 <memcpy>
                ...
8000b86c <_ZN10Descriptor6ReadAtExPvm>:
                ...
8000b890:       e1a0000c        mov     r0, ip
8000b894:       e1a0e00f        mov     lr, pc
8000b898:       e594f010        ldr     pc, [r4, #16]
                ...
8000ba28 <read_pos>:
                ...
8000ba5c:       e1a02004        mov     r2, r4
8000ba60:       e1a03005        mov     r3, r5
8000ba64:       ebffff80        bl      8000b86c <_ZN10Descriptor6ReadAtExPvm>
                ...
8000d050 <_ZN4boot9Partition6ReadAtEPvxS1_m>:
                ...
8000d0e0:       e0922004        adds    r2, r2, r4
8000d0e4:       e0a33005        adc     r3, r3, r5
8000d0e8:       ebfffa4e        bl      8000ba28 <read_pos>
                ...
8000b86c <_ZN10Descriptor6ReadAtExPvm>:
                ...
8000b890:       e1a0000c        mov     r0, ip
8000b894:       e1a0e00f        mov     lr, pc
8000b898:       e594f010        ldr     pc, [r4, #16]
                ...
8000ba28 <read_pos>:
                ...
8000ba5c:       e1a02004        mov     r2, r4
8000ba60:       e1a03005        mov     r3, r5
8000ba64:       ebffff80        bl      8000b86c <_ZN10Descriptor6ReadAtExPvm>
                ...
80013218 <_ZN18PartitionMapParser19_ReadPartitionTableExP15partition_table>:
                ...
80013288:       e0962004        adds    r2, r6, r4
8001328c:       e0a73005        adc     r3, r7, r5
80013290:       ebffe1e4        bl      8000ba28 <read_pos>
                ...
800135c8 <_ZN18PartitionMapParser5ParseEPKhP12PartitionMap>:
                ...
80013614:       e3a02000        mov     r2, #0  ; 0x0
80013618:       e3a03000        mov     r3, #0  ; 0x0
8001361c:       ebfffefd        bl      80013218 <_ZN18PartitionMapParser19_ReadPartitionTableExP15partition_table>
                ...
80008a00 <serial_puts>:
                ...
80008a78:       ebffffda        bl      800089e8 <serial_putc>
80008a7c:       e1a00007        mov     r0, r7
80008a80:       ebffffd8        bl      800089e8 <serial_putc>
                ...
800084e0 <dprintf>:
                ...
80008510:       a1a01003        movge   r1, r3
80008514:       e1a0000d        mov     r0, sp
80008518:       eb000138        bl      80008a00 <serial_puts>
                ...

Which is a lot more meaningful than a list of addresses.

Update: both of these are in the puppy bits project

Hmm, time to go in search of food I think.

Oh, turns out I was booting the wrong image with haiku ... so that splash screen really was less than it seemed. However once I changed to using the correct image, I get pretty much the same result - a pretty face but no brains.

Friday, 5 February 2010

Video killed the programming bloke ...

Well I was up till the wee hours working on some video code. Blah. Basically converting a register dump into code with #defined constants and other 'nice' stuff. Pity it doesn't do much more though.

Then I spent pretty much all day today failing at trying to get S-Video output working. But I just can't get it to work. I get some sort of signal out, and it looks like it could be the test pattern, but there doesn't appear to be any sync signal, and it's a bit weak too. At this point I think it might be worth cutting my losses and leaving it. For all I know the video DAC isn't even powered on properly - but to play with it's power you need to use I2C.

Actually that isn't all I did, as well as the video setup, I was `cleaning up' some other basic routines. Some clib-less debug stuff, and better exception handlers. I'm sick of rewriting bits of mess every time I try something new, and maybe this'll let me put it on the 'net at some point too.

I submitted some patches to Haiku too, one of which was applied within a few minutes.

Hmm, forgot to eat too, and now it's evening again. Mates are down the pub asking me along but I just don't fee like it today. Just finished a beer here and all I want to do is sleep now.

Thursday, 4 February 2010

Oops

Ok, so my MMU code was all broken. First I was just using the wrong number of bits in the L2 pages - x86 uses 4K pages with 1K entries, but ARM is only 1K pages with 256 entries, and I can't add up simple 2 digit numbers ... But even that didn't help ... many iterations and hours later ... ahh, I forgot to map the serial port - I was only mapping 16MB of i/o and there's another 1MB to map. Grr. Added that to the Haiku code and suddenly turning on the MMU 'works'.

Well it wasn't all wasted effort, I have a better understanding of the various permission and cache bits now. Better than nothing at least.

So ... ta-da ...

Actually it's nothing to be too excited about - that's all it does, and it's been hacked in very messily.

Wednesday, 3 February 2010

Damn MMU

I didn't have much time today but I had another go with the MMU, but this time on some stand-alone code.

No dice. It just goes off into la-la land as soon as I turn it on, no exceptions or any indicator of what went wrong. I guess the page tables are bung.

I can see this is going to be fun.

Tuesday, 2 February 2010

Haiku @ Beagle

Another day piss farting about on the PC. Ended up that i was up till 3 last night, so I was a bit tired, and had a nap in the afternoon, and even managed to load up the bin with rubbish from the yard ... but I made some progress, if not much.

So with Haiku I got the booting stuff worked out, loaded it onto an SD card, and tried booting it. Not surprisingly it doesn't get too far. After a lot of playing around I managed to get it far enough that it crashes as soon as it turns on the MMU. The page tables seem to be OK, I think (although it looks to me like it's using old-format tables, and they're not implemented in cortex-a8 afaict - but I tried some variations to no avail), and the other initialisation looks about right ... but as with just about everything i've tried doing on this machine, looking right and being right aren't the same thing.

Might ask about the status on the mailing list, and in the mean-time try some MMU code stand-alone - the load/test cycle is pretty slow going through the old SD card i've been using.

Monday, 1 February 2010

Muckin About

Another day hacking around inside - too lazy/warm to do the yard, and mum was down anyway, so was socialising during the morning. Hacking is more interesting anyway.

Kept poking around with the FORTH code, and managed to fix a couple more things. Silly mistakes like using the jonesforth versions of words rather than the (standard) ones I wrote. Got a rough POSTPONE implemented, but only for non-immediate words. And now, der, i've got a nice chicken and egg since I need it to implement IF (the way i'm doing it) and I need IF to implement POSTPONE properly. Ahh well; just have do it in ASM instead I guess, it's only trivial anyway. So I finally started filling out the higher level words, although not terribly quickly. Need to work out how to bootstrap it too ... ideally I would use a cross-compiler to generate everything into a memory image, but as the code compilation process itself extends the compiler, it makes things a bit more `interesting'. I'll think about whether that's worth worrying about (almost certainly not with this toy implementation), but to start with I guess i'll just compile it as part of the booting process on the target host.

Had a go over of the weekend at trying to get AI-OS (from the Touch Book) to work on the beagleboard, but unfortunately it looks like it no longer works on this hardware. Or I just did something wrong - i'm using a HDD since I don't have a big enough SD card and there's definitely stuff that hardcodes the SD slot. I finally got it to start booting using an Angstrom kernel, ... and then minicom crashed during the boot-up process, so I don't know why it hung - or even if it did at all, but after a few minutes I hit the reset button. And now when I try to boot the kernel just crashes with a NULL pointer, so it must've upset something on the SD card in the process. Ahh well it wasn't like I was expecting much ...

Also read a bit about Google's Go language. I can't remember if i'd heard about it before (maybe I have?), but I never looked too closely until now. Hmm, sounds fairly interesting ... might have to play with that next.

And today I also had another go looking at the ARM port of Haiku. Finally! The compiler builds! No more obscure error which makes no sense! Well that's a pretty good start anyway. Managed to get it to build (with a small fix for a renamed type), but am still working out the disk image building and Das U-Boot stuff so that's as far as i've gotten so far.

I've moved! So long and thanks for all the fish!