Monday 25 December 2017

jjmpeg 3.0 released

Put enough together to push out a release of jjmpeg.

It ended up 1700 lines of Java, 2000 lines of C, and 300 lines of Perl.

Apart from supporting the latest version of FFmpeg (at least when I started a couple of weeks ago), it's smaller, cleaner, and more complete than any previous version. Having said that this is essentially just a beta release.

This one is licensed GNU General Public License Version 3 (or later).

I've kinda had enough for the moment so it's a pretty bare home page, but it's there.

Merry XMAS!

Friday 22 December 2017

damned enums

Been a long week but i'm finally done with work for another year. Although it's mostly a long week because of the late nights working on jjmpeg ...

One of the things I did was fill out/sync up the important enums - AVCodecID, AVPixelFormat, AVSampleFormat, and so on. Previously the pixel format and sample formats were also Java enums - which can be convenient at times and provides some more (albeit much much overvalued) 'type safety'.

This was fairly easy because the PixelFormat was a simple densely ordered C enum so i could map between the two with a simple +-1. Unfortunately someone decided to add a big hole in the middle of it sometime between 0.10 and 3.4, ... so I gave up and just converted it to a class holding static final int's, and to make it consistent I did that with the other enumerations as well. It doesn't really make the classes any harder to use and improves the class size and memory footprint. I just added some methods to access libav*'s metadata information so I can still map between string representations and so on.

I had to add a small compilation stage which extracts the enums from the header files and converts them to a C file which when compiled and run produces the Java source ... this seemed the absolute shortest path to ensuring I got accurate numbers based on the ffmpeg build configuration.

So after about a weeks worth of solid work it's grown somewhat (about 2KLOC Java, and 2KLOC C, counting lines with ";{}") and the TODO list is getting pretty short.

I would like to clean up the exception design a bit - unfortunately i'm just not very good at that (who is?) but i'd like to get better. The build system is clean and simple but could be improved and needs to include the aforementioned enum stuff, a dist target and versioning. Logging would be nice (both redirect ffmpeg to java.util.logging and some for jjmpeg itself). JJMediaWriter? Fix the license headers, add at least a README.

Not today though, today I drink.

Tuesday 19 December 2017

jjmpeg, jni, javafx

So I guess the mood took me, I somehow ended poking away until the very late morning hours (4am) the last couple of nights hacking on jjmpeg. Just one more small problem to solve ... that never ended. Today I should've been working but i've given up and will write it off, it's nearly xmas break anyway so there's no rush, and i'm ahead of the curve anyway.

JJMediaReader

I got this ported over and playing video fairly easily, and then went through on a cleanup spree. I removed all the BufferedImage, multi-buffering, and scaling stuff and a few other experiments which never worked. Some api changes allowed me to consolidate more code into a base class, and some changes to AVStream necessitated a different approach to initialising the AVCodecContext (using AVCodecParameters). I made a few other little tweaks on the way.

The reason I removed the BufferedImage code is because I didn't want to pollute it with "platform specific" code. i.e. swing, javafx, etc. I've moved that functionality into a separate namespace (module?).

My first cut just took the BufferedImage code and put it into another class which provides the functionality by taking the current AVFrame from the JJMediaReader video stream. This'll probably do but when working on similar functionality for JavaFX I took a completely different approach - implementing a native PixelReader() so that the native code can decide the best way to write to the buffer. This is perhaps a little more work but is a lot cleaner to use.

swscale

jjmpeg1 lets you scale images 'directly' to/from primitive arrays or direct ByteBuffers in addition to AVFrame. Since they have no structure description (size, format), this either has to be passed in to the functions (messy) or stored in the object (also messy). jjmpeg1 used the latter option and for now I simply haven't implemented them.

The PixelReader mentioned above does implement it internally but for code re-use it might make sense to implement them with the structure information as explicit parameters, and use higher level objects such as PixelReader/Writer to track such information. On the other hand the native code has access to more information so it also makes sense to leave it there.

I went a bit further and created a re-usable super-class that does most of the work and toolkit specific routines only have to tweak the invocation. This approach hides libswscale behind another api. The slice conversions don't work properly but they're not necessary.

jni

So far I had public constructors and `finalisers' because otherwise the reflection code failed. That's a bit too ugly (and `dangerous') so I made them private. The reflection code just had to look up the methods and set them Accessible.

    Constructor cc = jtype.getDeclaredConstructor(Long.TYPE);

    cc.setAccessible(true);

    return cc.newInstance(p);

Whilst working on JJMediaReader I hit a snag with the issue of ownership. In most cases objects are either created anew and released (or gc'd) by the Java code, or are simply references to data managed elsewhere. I was addressing the latter problem by simply having an empty release() method for the instance, but that isn't flexible enough because some objects are created or referenced the the context determines which.

So I expanded the Java-side object tracking to include a `refer' method in addition to the 'resolve' method. `resolve' either creates a new instance or returns and existing one with a weak-reference object which will invoke the static release method when it gets finalised. `refer' on the other hand does the same thing but uses a different weak-reference object which does nothing.

I then noticed (the rather obvious) that if an object is created, it can't possibly 'go away' from the object tracking if it is still alive; therefore the `resolve' method was doing redundant work. So I created another `create' method which assumes the object is always a new one and simply adds it to the table. It can also do some checking but i'm pretty sure it can't fail ...

If on the other hand the underlying data was reference counted then the `resolve' method would be useful since it would be possible to lookup an existing object despite it being `released'. So i'll keep it in CObject.

As part of this change I also improved CObject in other ways.

I was storing the weak reference to the object itself inside the object so I could implement explicit release and to avoid copying the pointer. I removed that reference and only store the pointer now. The WeakReference it already tracked in a hash table so I just look it up if I need it. This lets me change the jni code to use a field lookup rather than a function call to retrieve it (I doubt it makes much perf difference but I will profile it at some point).

I also had some pretty messy "cross-layer" use of static variables and messy synchronisation code. I moved all map references to outside of the weak reference routine and use a synchronised map for the pointer to object table.

For explicit release I simply call .clear() and .enqueue() on the WeakReference - which seems to do the right thing, and simplifies the release code (at least conceptually) since it always runs on the same thread.

Monday 18 December 2017

`parallel' streams

I had a task which I thought naturally fitted the Java streams stuff so tried it out. Turns out it isn't so hot for this case.

The task is to load a set of data from files, process the data, and collate the results. It's quite cpu intensive so is a good fit for parallelisation on modern cpus. Queuing theory would suggest the most efficient processing pipeline would be to run each processing task on it's own thread rather than trying to break the tasks up internally.

I tried a couple of different approaches:

  • Files.find().forEach() (serial to compare)
  • Files.find().parallel().collector(custom concurrent collector)
  • Files.find().parallel().flatMap().collect(toList())

The result was a bit pants. At best they utilised 2 whole cores and the total execution times were 1.0x, 0.77x, and 0.76x respectively of the serial case. The machine is some intel laptop with 4 HT cores (i.e. 8x threads).

I thought maybe it just wasn't throwing enough threads at it and stalling on the i/o, so I tried a separate flatMap() stage to just load the data.

  • Files.find().parallel().flatMap(load).flatMap(process).collect(toList())

But that made no difference and basically ran the same as the custom collector implementation.

So I hand-rolled a trivial multi-thread processing graph:

  • I/O x 1: Files.find().forEach(load | queue)
  • Processing x 9: queue | process | outqueue
  • Collator x 1: outqueue | List.add()
With a few sentinel messages to handle finishing off and cleanup.

Result was all 8x "cores" fully utilised and a running time 0.30x of the serial case.

I didn't record the numbers but I also had a different implementation that parallelised parts of the numerical calculation instead. Also using streams via IntStream.range().parallel() (depending on the problem size). Surprisingly this had much better CPU utilisation (5x cores?) and improved runtime. It's surprising because that is a much finer-grained concurrency with higher overheads and not applied to the full calculation.

I've delved into the stream implementation a bit trying to understand how to implement my own Spliterators and whatnot, and it's an extraordinarily large amount of code for these rather middling results.

Not that it isn't a difficult problem to solve in a general way; the stream "executor" doesn't know that I have tasks and i/o which are slow and with latency compared to many small cpu-bound tasks which it seems to be tuned for.

Still a bit disappointing.

Sunday 17 December 2017

jjmpeg & stuff

Well for whatever reason I got stuck into redoing jjmpeg and seem to have written most of the code (90%?) after a couple of weekends. It was mostly mandraulic and a bit tedious but somehow surprisingly relaxing and engaging; a short stint of unchallenging work can be a nice change. A couple of features are still missing but the main core is done.

Unfortunately my hope that the ffmpeg api was more bindable didn't really pan out but it isn't really any worse either. Some of the nastiest stuff doesn't really need to be dealt with fortunately.

I transformed most of the getters and setters into a small number of simple macros, and thus that part is only about as much work as the previous implementation despite not needing a separate compilation stage. I split most of the objects into separate files to make them simpler to maintain and added some table-based initialisation helpers to reduce the source lines and code footprint.

It's pretty small - counting `;' there's only 750 lines of C and 471 lines of Java sources. The 0.x version has 800 lines of C and 900 lines of Java, a big portion of which is generated from an 800 line (rather unmaintainable) Perl script. And the biggest reduction is the compiled size, the jar shrank from 274KB to 73KB, with only a modest increase from 55KB to 71KB in the (stripped) shared library size (although the latter doesn't include the dvb or utility classes).

There's still a lot of work to do though, I still need to test anything actually works and port over the i/o classes and enum tables at the least, and a few more things probably. This is the boring stuff so it'll depend on my mood.

Fuck PCs

In other news I finally killed my PC - I tried one more time to play with the BIOS and after a few updates it got so unstable it just crashed during an update and bricked the motherboard. Blah. I discovered I could order a new BIOS rom so i've done that and i'll see if i can recover it, otherwise I might get another mobo if I can still get AM2+ boards here, or just get another machine. I'll probably look into the latter anyway as it's always been a bit of a hassle (despite working flawlessly when it does and it's a very nice small machine.

Friday 8 December 2017

jjmpeg?

Well i've had reason to visit jjmpeg again for something and although it's still doing the job, it's a very very long way behind in version support (0.10.x?). I've added a couple of things here and there (recently AVFormatContext.open_input so I could open compressed webcam streams) but i'm not particularly interested in dropping another release.

But ... along the way I started looking into writing a new version that will be up to date with current ffmpeg. It's a pretty slow burner and i'm going to be pretty busy with something (relatively interesting, moderately related) for the next couple of months.

But regardless here are a few what-if's should I continue with the effort.

  • The old generator + garabage collection support required 4 classes per object.

    1. Abstract autogenerated native WeakReference based accessors with native-oriented methods (passing `pointers').
    2. Manually written native accessors as above and the glue to make it all work.
    3. Abstract autogenerated public accessors with public-oriented methods (passing objects).
    4. Manually written public accessors as above and the glue to make it all work.

    Whilst most of it is autogenerated the generator sucks to maintain and it's a bit of a mess. I've also since learnt that cutting down the number of classes is desirable.

    So instead i'll use the "CObject" mechanism with the WeakReference being a simple native pointer holder object which also knows how to free it. In this case at most 2 custom classes are required - one for autogenerated code (if that happens) and any helper/custom code.

    A few things require reflection going this route but the overheads should be acceptable.

  • Native memory was wrapped in native ByteBuffer objects.

    Originally the goal was to have java just access fields directly but in practice this wasn't practical as the structures change depending on the compile options so you end up with both the C and Java code being system specific, and the Java code requires a compiler to implement it (C being handled by gcc). A side-goal was to make the Java library bit-size independent without resoring to long - although that's all ByteBuffer uses.

    Because the objects are just wrapped on the pointer there is the possibility that multiple objects can be created to reference the same underlying C object (e.g. getStreams().get(0) repeated). Whilst this isn't as bad as it sounds one has to ensure the objects aren't holding any of their own state java-side for example. It also turns out that a direct ByteBuffer isn't terribly fast either from the Java side or looking up from the C side (not sure why on the latter).

    CObject just uses a long directly, which also precludes the likelyhood of poking around C internals by accident or otherwise. It also ensures unqiue objects reference unique pointers - this requires some overhead but it isn't onerous.

  • Using two concrete classes per object allowed the internal details of passing pointers (ByteBuffer) around to be hidden from the luser.

    • But it requires a lot of scaffolding! The same method written at least 2 times!
    • Although the C call gets a ByteBuffer directly, looking up the host pointer still requires a JNIEnv callback.

    CObject likewise uses an accessor to retrieve the native pointer, but because it's the super-class of all objects the objects can simply be passed in directly. That is native methods just look like java methods and so there is no need for any trampolines between method interfaces. It does require a bit more support on the JNI side when returning objects, but it's trivial code.

    An alternative would be to use a long and pass that around but then you still need the public/native separation and all the hassle that entails.

  • A lot of the current binding is autogenerated.

    Once the generator was written this was fairly easy to maintain but getting the generator complete enough was a lot of work. The biggest issue is that the api just isn't very consistent, and some just don't map very nicely to a Java api. Things such as out parameters - or worse, absolute snot like like AVDictionary that should never have existed (onya libav!).

    Each case required special-case code in the generator, often extra support code, and sometimes a fall-back to manually writing the whole lot.

    Working with zcl - and in that case the OpenCL apis are much cleaner and consistent - I discovered it was somewhat less work just to do it manually and not really any harder to maintain afterwards. At least in the case of the original ffmpeg the inconistency was simply because it wasn't originally intended as a public library, and I suspect the newer versions might be a bit better.

    I'm still undecided about simple data accessors as a good case can be made for saving the typing if there are many to write. So perhaps they could still be autogenerated (to an abstract super class as now), or they could be parameterised like they are in zcl (i.e. internally getInt(field) with public wrappers). Another half and half option would be to use the C preprocessor to do most of the ugly work and still write the Java headers by hand. Probably the last one.

Does anyone else care?

Friday 1 December 2017

RAM saga

Powered down last night because of an approaching thunderstorm ... half my ram gone again.

disable_mtrr_cleanup did nothing. disable_mtrr_trim would hang the boot.

I noticed the RAM speed was wrong in the BIOS again so i reset it, and lo and behold it all showed up, but only until the next reboot. Back to bhe BIOS and just changed the ram from one speed to another and back again - RAM returns!

Erg.