Monday, 6 July 2015

the future is micro?

Although i haven't been terribly active on it i've still been regularly mulling over a few ideas about the future of the stuff i did on google code, and this blog.

My plan some time ago was to setup a personal server locally - it wouldn't handle much traffic but I never got terribly much - and this is still the plan. The devil is of course in the details. If it turns out to be inadequate I can always change to something else later but given the site history I find this unlikely.

This choice is also intentionally something of a political one. Centralised control of information is becoming a significant societal problem and with the cheap availability high speed internet, computing power, and storage provides a means to tackle it head on via decentralisation.

Micro-Server

So after a few small experiments and mostly in-head iterations i've settled on a implementing stand-alone micro-server with an embedded db. I was going to play with JAX-RS for it but the setup required turned me off it. I think the tech is great and the setup is necessary but I just don't need it here. I have the knowledge and skills to do almost everything myself but at least initially i'm going to use the JavaSE bundled http server with berkeley db je as the transactional indexing and storage layer.

After many iterations I have designed an almost trivial schema of 3 small core tables which sits atop JE which allows me to implement a complex revision history including branches and renames. Think more of a `fixed' cvs rather than subversion; copies aren't the basis of everything and therefore aren't `cheap', but branching and especially tagging is (revisions are global like svn). Earlier prototypes supported both cheap copies and branching but i felt they lead to unworkable cognitive complexity and I realised that since I think the subversion approach just isn't a good solution at all I should not even try to support it. The work I did on DEZ-1 was for this history database and revisions are stored using reverse deltas. Although this is not the aim or purpose it should be possible to import a full cvs or subversion revision tree and retrieve it correctly and accurately; actually I will likely implement some of this functionality as a basis of testing as this is the easiest way to obtain effectively unlimited test data.

Atop this will sit a wiki-like system where nodes are referenced by symbolic name and/or branch/revision. Having a branch-able revision tree may allow for some interesting things to be done: or it may just collapse in an unscalable heap. Binary data will be indexed by the db but storage may be external and/or non-delta where appropriate.

From very long ago I was keen on using texinfo as the wiki syntax; i'm still aiming for this although it will mean a good deal of work converting the blog and posts over even if automated. The syntax can be a bit verbose and unforgiving though so i'll have to see how it works in practice. There are some other reasons i'm going this route although it is unclear if they will prove useful or not yet; some potential examples include pdf export, response optimisation, and literate programming. Its likely i'll end up with pluggable syntax anyway.

The frontend will be mostly be html+css and perhaps small amounts of javascript; but it's not going to be anything too fancy initially because I want to focus on the backend systems. Authoring is likely to be through external command line and/or desktop tools because I find the browser UX of even the most sophisticated applications completely shithouse and the effort i can afford them would render any I made even more pathetic.

The project itself will also be a personal project: it will be Free Software (AGPL3) and maybe someone else will find it interesting but providing a reference product for others isn't a goal.

Living prototype

This project actually started years ago as everything from a C based bdb prototype to a JavaEE learning exercise. In the distant past I have ummed and ahhed over whether it should be absolute bare-bones C or full-blown JavaEE. I think it may well never get much beyond these experiments but unless I start it definitely will not. So I thought it's about time to put a stake in the ground and get moving beyond experimentation.

So my latest current plan is to begin with implementing my internode software pages. A read-only version covers the basic response construction, namespace and paths, and file and image serving mechanisms. Then moving on to authoring touches on revision and branch management. Adding a news system will allow this blog to be moved across. Comments would make sense at this stage but aren't trivial if moderated, as I would desire. This is most of the meat and would also allow some version of the google code stuff to make it across. Then I could think about what next ...

The idea would be to go live as soon as I get anything working and just continue working on it 'live'; availability not guaranteed. A system in constant pre-alpha, beta, production.

I'm pretty sure i've got the base of the revision systems working quite well. Object names (& other metadata) and object data history are tracked separately which allows for renames and version specific meta-data. It's actually so simple i'm not quite sure it will support everything I need but every use-case i've tried to test so far has been solvable once I determined the correct query. I've still to get a few basic things like delete implemented but these are quite simple and the hardest part now is just deciding on application level decisions like namespaces and path conventions. Other application level functionality like merging is something for later consideration and doesn't need implementing at the db layer. I still need to learn some JE details too.

Initially the architecture will be somewhat naive but once I see how things start to fall out I want to move to a more advanced split-tier architecture based on messaging middleware. This is a long term plan though. I will aim for scalability and performance but am not aiming for "mega"-scalability as that is simply out of scope. Things like searching (lucene) and comments can be tacked on later. Being a personal server authentication/authorisation and other identity related security systems aren't an initial focus.

I've done the texinfo parsing a few times and my current effort is still some way from completion but i will probably just start with the basics and grow it organically as I need more features and only worry about completeness or exporting later on. I will start with processing everything live but resort static snapshots if it proves too burdensome for the server. Actually the revision tree provides the perfect mechanism for efficiently implementing incremental snapshots so it will probably just fall out of testing stuff anyway.

The why of the what

I was prompted to think about this again by the only request about jjmpeg source i've had and i'm also in the middle of a 2-week break. I've spent a couple of those break days poking around but so far it hasn't really gotten it's teeth into me so it will continue to be a slow burn (and i really do just want a short break).

Apart from setting up the hardware and deciding on some `simple' decisions i'm quite close to having something running.

2 comments:

Unknown said...

Are you aware of fossil? ( www.fossil-scm.org )

It's a distributed vcs that sounds exactly like your description, and also maps nicely to your later points about centralisation, one of the key reasons I'm using it.

NotZed said...

No i wasn't aware of it - but i guess it's somewhat gratifying to know that others have come to similar conclusions (even if they're quite obvious).

I'm just interested in passing time for a hobby so I will keep poking at this in isolation (until I get bored with it?). I'm not doing it for money or training or self-education or self-promotion or even the end-goal of a web-site; I don't particularly like most web technology but see great utility in its core principles, and thankfully don't have to fight with it daily in my job.

This 'thing' has actually been in gestation for a very long time - i have an unpublished blog post from early 2009 and even that is talking about a project that i've been "dabbling with for a while". I originally had a revdb and info translator in C which predates that by some years, and my mental picture of the end goal has only changed at the periphery since then.

I guess if i think back i've been playing with bits of the underlying tech since the second half of the 90s whilst picking up some good ideas along the way (like wikis). For example I wrote a msql (not mysql!) knock-off before php came along and knocked it off and out of history altogether. msql was basically an extended version of ssi which also hooked into a database. I don't know which came first or who influenced whom, but that's the order i encountered them.