Sunday, 7 September 2008

Chrome, again.

I recently posted my last entry on b.g.o, and I said I wasn't going to rant about what is wrong with the desktop (well I did before I deleted it). But maybe I should have, as with fortuitous timing, my second to last entry about Chrome should have reminded me what Chrome is capable of. I will only say in my defence that I was only considering Chrome as a browser, and maybe as an `ms office' replacement, and dismissing views otherwise (well that is how I use a browser).

First, some background. I had been noticing the trend to move toward Python in GNOME in particular, and I haven't liked it. I know why developers like it (well why they claim to like it), but as a user it leaves a lot to be desired - slow, extremely heavy applications, that too-often bomb out with meaningless backtraces. I had some ideas that could make it palatable to users (well, beyond just debugging), but it relied on some features which Python lacks, so I gave up thinking about it. But Python isn't the only problem.

The GNU desktop is in an awful state - and that's even if you stick to just one flavour and it's attendant applications (I don't know about KDE, but the following is true of both GNOME and Xfce). If you take a default install of your average `distribution', for example, Ubuntu, after installing a rather large number of packages you end up with a pretty login window, and a relatively pretty desktop, and quite a few applications, from basic to outstanding, from buggy to stable. But what is behind the actual desktop? A mis-mash of random programmes the packager/desktop team determined to be useful for themselves or some mythical `average luser'. Some work well, some don't, some are necessary for the basic operation of the machine (auto-mounting and network selection), others are pure fluff, most are in-between. Also - it barely runs ok if you have only 256MB of memory, for example that `older machine' that GNU/Linux can supposedly take advantage of, or embedded/special machines, like a Playstation 3, both of which actually affect me.

One problem is that the `in thing' these days seems to be to write (or re-write!) many of the applets/applications that provide core desktop functionality using Visual BAS... oh oops ... Python. Now Python is a `scripting language'. This means that every time you run a python ap, it must compile the source-code into byte-code or perhaps machine code (I do not know if there are pre-compilers for it). This takes time, and it takes memory, and to do it well it can take a lot of memory and time, and this is one reason traditionally that developers had much beefier machines than users - because they're the only ones who had to do this step, once. If it only compiles to byte-code, then every basic instruction is emulated using a state machine - a 'virtual machine' (VM), which is at least and order of magnitude slower than the physical machine is. Any conversion to machine code and further optimisations which make the running speed faster, also generally cost in memory and cpu time during the compilation phase. For simple scripts and applications this is no big deal, but for more complex applications it can start to add up. Not only that, because many of the libraries themselves are written using the scripting language, every application which uses those libraries needs to recompile the same libraries every time they run - and more importantly store their own copy of the byte/machine code. I will also mention in passing that many of these `libraries' are just `wrappers' - glue code which just calls some `C' library to do the actual work; but someone has to write those too, so either the script engine `vendor' or the library `vendor' must expend additional resources (which wouldn't otherwise be needed) for this work, so the cost isn't born solely by the users.

Scripting languages are just fine for short-lived applications, they run, do their job, and finish, releasing the memory they used - even if it is excessive it doesn't usually matter. And often they are `batch' processes anyway - non-interactive programmes which run by themselves, and so long as they run to completion they needn't be particularly speedy. But now with applets and other trivial applications that run for the entire time you're at the computer, or they require interactive response, they are a potential disaster. You now have a separate VM for every application loaded, with all the non-shareable data that entails. Often scripting VM's haven't even been designed with this in mind, and in that case they may be quite cavalier with their use of memory because it isn't an issue for the workloads for which they were designed. Most of these languages use garbage collection too - but garbage collectors are quite hard to write properly, so there are often bugs, but even when those are all fixed, to get performance they generally need more total memory than they're actually using (sometimes by a lot, but often about twice). And again, all of this overhead needs to be duplicated for each VM running. Contrast that to say a C application. When an application is compiled in the normal way, all of the code, and all of the code of the libraries can be shared in memory. Far more time and memory can be spared during the compilation phase, since it is only done once. And explicit memory management at least forces you to think about it, even if you don't take advantage of that opportunity for thought (even if explicit memory management has spare/overheads for efficiency, it's a trade-off you can control). And finally, often the reason programmers use scripting languages in the first place is because they are easier - or to translate (in some cases) - they don't know any better. Although they may have the enthusiasm and the ideas, they may just not have the skills to pull it off properly.

Another problem affects all languages - that is the startup time/non-shared data overhead. Things such as font metric tables (sigh, and font glyph tables/glyph cache, now the font server has been basically dropped - remote X sucks shit now, even though networks are much faster), display information, other global state tables, and other data which is loaded at run-time, and could otherwise be shared among applications. This only gets worse when you have many versions of the same library present, and/or completely different libraries which do the same thing. Sure you can run a KDE application on a GNOME desktop, but it isn't at a zero cost, as even basic things like displaying a string of text involves an extraordinary amount of logic and data, little of which will be shared.

Having so many libraries to choose from, and indeed a continually changing set of libraries to choose from, is also a particular problem with GNU desktops (and Windows at least). Add to that - people keep coming up with their own `framework' which will `solve all the problems' in a specific domain, but all it really does is add yet another set of libraries (and versions over time) that we all have to put up with if we want to run a particular application that uses them (or worse, the poor developer is burdened with having to develop and maintain yet-another backend when they could be doing real - and more importantly; interesting - work). Even if the one library is the one everyone uses, new versions seem to come out every year or so.

So the result is, that in 2008 we have a desktop with barely more features than one in 2000, yet consuming far more resources. Tiny little applets which could just as easily been written in any language, are dragging in millions of lines of code and megabytes of memory by virtue of being written in a scripting one. Lots of libraries - many which do the same thing, even just different versions of the same one - often end up being installed as well.

There are at least a couple of ways to get around the scripting problem, and they also cover the shared state and library's breeding like fundie children as well. If you're not using scripting they don't help - but shared state could be addressed using traditional IPC mechanisms (i.e. use a server), but because of the complexity this is often not done. Fixing the breeding library problem in general is tricky - each library needs to be far more disciplined in their design, and make use of ld features for backward/forward compatibility if required. Some duplication is still necessary - competition is generally good - although perhaps application developers should avoid using every new library that comes out just because it is new and promises to abolish world hunger.

First possibility, you have a separate process that compiles and executes all scripts - a script `application server', in today's language. For a stand-alone script, a small client uploads/tells the server which script to execute, and the server sends the results back to the client using queues and/or rpc. Because the scripts are executed in the same address space, they can share libraries, the garbage collector, and other resources. You also have the benefit that if you want to extend your application with scripting facilities, any application can use the same mechanism to run their own scripts. This could also provide a powerful system whereby you can write meta-applications, talking between applications as well, if you design the system properly. Threading is an issue - but it's an issue that only has to be solved once, by people who probably have an idea, rather than clueless application programmers.

The other way is to move your applications to the (one) server. All applications simply run in the same VM/address space, and again all code and much data can easily be shared among applications. Where you need additional non-scripted facilities you either build them in/use plugins, or use IPC mechanisms. And you only have to do it once too. Although meta-application programming is certainly possible, it would have to be an additional layer or protocol that needn't be there by design. And you can't really write an application that has a scripting `extension mechanism' either - since the app is the script.

The first way is sort of how AREXX worked. It can be quite simple, yet very powerful. Nobody wrote applications in AREXX, but they did write meta-applications which literally let completely unrelated applications `talk' to one other. The second way, if taken to the extreme, is something like JavaOS or that M$ thingy that does the same thing.

Hmmm. So I guess one potential realisation of the second idea is Chrome. It isn't a browser, it's an application framework, or rather, an os-independent application execution environment, a meta-operating system if you will. The sort of thing Java was capab;le of, but didn't work so well because it was too fine grained/no central server. The sort of thing Flash is basically doing now, although it's too buggy and also no central server. Probably the closest is the sort of thing GNOME was originally envisioned to be (as i fuzzily remember it - the NOM in GNOME) before being down-graded to basically a Gtk theme - although the glandular-fever infected among them are still thinking along those lines, I think. The sort of thing Firefox always claimed to be, but you couldn't take seriously because we all know what a bloaty pig's bum it was, and still is, even though they've made great strides in the swine's bun-tone. Well, at least the process model in Chrome makes sense now.

So watch out GNOME and KDE and Xfce. All of those little crapplets that deal with no or small amounts of data - they can all be re-written as trivial JavaScript applications, and probably with network transparency built in (I haven't mentioned `google gadgets', because it should be obvious this is one and the same thing). e.g. post-it notes, a desktop clock/calendar which links into your planner, rss aggregators, umm, whatever it is people run on their desktop, file browsers aren't much different from an internet browser either. So maybe the `start menu' (for native apps) can't be written - well, yet - because of the OS integration, so that is safe for now. Still, who knows, they've got the sandboxing, so there will perhaps be a mechanism for priviledge escalation as well, and it can be made as secure as yum or apt-get (i.e. not very). If they implement a VIDEO tag, and SVG properly, with any luck Flash and M$' flash knock-off can get the bullets in the head they deserve as an added bonus. Good riddance to bad rubbish there.

Ok, so perhaps I was wrong in my second to last post on b.g.o. Chrome isn't just another featureless webkit browser after all (although it is still too featureless for me). But it isn't just Firefox that has to fear from another browser, it is not just desktop applications that have to fear from another browser, it is the desktop as we have come to know it - and thank fuck for that too.

Ahh well, maybe that isn't the idea `they' had. It has the potential though, if the VM and GC is as good as the claims on the box. And if Google doesn't do it, someone else can - because it's free software.

No comments: