Monday, 5 March 2012


So, I kept watching some more of the AMD fusion summit videos yesterday evening - and now I remember one reason I haven't watched them before: they take a long time to get through!

The GCN one was interesting (from 'Mike and Mike'), although the first Mike was a bit nervous and together with his Texan accent and rushing a bit, made him a bit hard to follow.

But the Aparapi one gave me some new respect for the project.

Although one thing I didn't think was correct was he kept going on about how Java programmers are completely braindead and don't know how to code for performance or deal with other issues (it's not like they're Python coders FFS):

Java coders have never seen a pointer eh? Object references behave the same because they are the same, and although you can't increment/index them, most coders stick to direct pointer dereferencing and array indexing anyway: i.e. exactly the same way they're used in Java. And they both have null pointer dereferences, it's just that Java's are non-fatal. So one could say Java programmers have never seen a fatal pointer access, but just because they're called object references or arrays, doesn't mean they're not pointers ...

Other languages?
Java programmers seem to love XML for some reason: they can cope with other languages. Look at ant: it's an XML form of bourne-shell! Not to mention Scala and so on.

Actually any of them already using Java for engineering/science know how to get performance: use objects of (single-dimensional) arrays, not arrays of objects. Write really ugly maths to cope with it. There's just no other practical choice without taking unacceptable performance hits.

And this goes for more fundamental understanding of the underlying architecture too. I know everyone likes to make out that abstraction means you don't have to know about registers and cache and how language constructs are executed by the hardware (and I know some schools of computer science try to hide such details), but that's bunkum. It affects every language because at the end of the day they're all executed as machine instructions.

Threading, barriers, etc.
Again for performance you can't avoid these. Java barriers are also the same semantically as OpenCL barriers and are a very simple concept if you're able to cope with the concept of concurrency at all.

Dynamic memory
Absolutely agree here, this is one thing Aparapi should hide.

So if a coder can't cope with these concepts already they are not going to be a potential user of Aparapi, so it seems strange to only target them as a potential adopter of the technology.

As with these concepts in Java, these 'braindead' coders will just use third party libraries and (ugh) frameworks to do this fiddly stuff for them, because even the simplest concurrency model that Aparapi presents (which is more like an OpenGL shader than a CL kernel) will be beyond them.

It's the actual concurrency which is the hard part: and for the most part OpenCL's concepts make the concurrency easier to deal with (or even, possible to deal with). So hiding the tools to deal with the concurrency whilst exposing the concurrency does seem a little counter-productive.

So I don't think hiding all the details of OpenCL is necessarily a good idea: sometimes one does need to know about data flow, local memory, 3-d workgroups, and barriers. 'system' and 'framework' programmers already need to know about threads and concurrency and so these related concepts will not be foreign to them. Although I noticed that Aparapi now has support for local memory blocks and so there's hope yet.

I'm not sure why Aparapi uses such an explicit memory transfer model either, when it could have managed the buffer memory in a simpler way. e.g. Java has accessors, why not use them to determine when buffer memory needs to be returned to the host? The data flow should be quite explicit from the kernel invocation order: no need to analyse the host code for this (and this is problematic: I don't see how the host code can invoke multiple kernels in a sequence from the example, since each kernel has its own single 'host' method - imho 'host' should be an attribute on any method rather than a single one, but maybe the api has moved on since then).

However, taking into account the future plans of the AMD/ARM platforms ... Aparapi has the potential to be much much more useful as the programming model could map quite well to the future target design. i.e. once one has zero-copy unified memory and a low-overhead job queue mechanism the cost of an Aparapi kernel call will become very low.

Although reviewing the plans, I suspect they're being a bit optimistic: i only recently discovered for example that async memory copies between CPU and GPU is only an 'preview' feature in the current sdk ... which was quite a 'WTF' moment - how seriously IBM-PC-XT is that ... OTOH moving the ringbuffer processing to the hardware removes most of the OS-specific code required to talk to the hardware and so will reduce the resource requirements for driver development (i.e. we wont have to wait for the driver devs, it'll already be done by the hardware guys).

I think the biggest problem for Java however is in the language itself, and in particular it's lack of mathematical expressivity[sic]. It doesn't support vector types for starters - which are handy in this case more for their ability to concisely realise mathematical expressions than the ability to map efficiently to SIMD processors. And even if one does wrap these in objects, using them is clumsy, error prone, and even messier if you're aiming for efficiency. e.g. it's simply cleaner writing C or OpenCL C code to perform most maths than it is doing the same in Java (in it's most purest object form, let alone optimised for practicality).

So although a lot of problems will probably map fine enough, occasionally one will be better off with lower-level access. If one could override the kernel code with your own string, but let Aparai handle the buffer management and let them all interoperate it would be very cool. Even if for example, it compiled the OpenCL code in to Java bytecode as the fallback (ok, this is a rather much bigger problem to solve ...).


Gary Frost said...

Whilst I generally agree with your comments regarding Java developers. I think that I ( I am the Aparapi presenter/developer) was a little hard on the average Java developer. I might dispute that Java developers familiar with Object references understand pointers, but your point regarding XML was well taken, and generally Java folk are pragmatic and will use other languages if beneficial. So I will tone it down in future :)

I also agree that Aparapi is in a weird place, we have enhancement requests to provide lower level control (to allow folks that understand OpenCL to contribute the code and allow Aparapi to coordinate buffers) as well as requests to hide more of what is going on (to allow the developer to hide OpenCL completely) and at present are maybe caught between two stools.

I would be interested in your feedback regarding the proposed extension proposal. Where we offer some reflection proxy trickery to bind OpenCL code to a Java interface.

A trivial example is here.

I personally think this will allow folks who want more control to dive down using a succint API.

This code is *almost* working (in a branch) and would be interested in your feedback.

Also maybe take a look at the general proposal that I am steerng towards.


Gary Frost said...

This is very interesting feedback. I am the Aparapi developer (and presenter) and in retrospect I do think I may have been a little harsh on Java developer capabilities. So apologies there.

I linked to this from the Aparapi project page. I hope that is O.K.

Also I would appreciate your feedback on the proposed extension mechanism to allow developers to offer their own OpenCL code.

This should offer a flavor.

NotZed said...

Hi Gary,

If I thought people were going to read it I might've been a bit less ranty :) But it's published, so no worries linking.

Although I don't hesitate to make sweeping generalisations without having used the product ;-); specific observations would benefit from having tried it. I think perhaps I will try to convert some of my socles code across to both aparapi and the new api (as it develops) to see how it goes. Although it might take me a while to get around to that (and I use images a lot ...) so some initial impressions ...

I think the 'simplified' version is too simple (whilst also being fairly difficult to actually use) and your initial version looked much more comfortable. After about 5 lines of code, you can't be poking around in java strings to edit the code (who does this for xml, html, or anything else?): the source needs to be loadable from a resource, and supporting run-time generated code is a must because that's a core feature of opencl and quite a useful one at that (custom build arguments with defines would also be nice ... just a simple way to get custom code for different problem sizes without needing a code generator). And it needs to be able to call other functions in the same file (i.e. don't generate the function header). The use of @common looks incredibly clumsy and will have no tool support. e.g. at least a .cl file you can edit in a decent C editor, run through an offline compiler, or even just match the compiler error line numbers with the source file. (this is one of the biggest hassles with auto-generated/stringified code, and forces you to add newlines as well - which you missed in your initial fft example, it just makes the editing process all that more clumsy and downright painful).

Although you could support both - either by separate methods or e.g. by querying an interface on the target object: 'interface CLSourceFactory { String getCL(); } type thing, I'm not sure I see the point: you already have aparapi for simple kernels, and as soon as you start doing anything complex this attribute stuff is going to get very messy and very hard to both write and debug. I don't see how trying to edit a bunch of dumb strings spread across a class file is better than a single .cl file in a C editor.

BTW I understand that people are excited about the possibilities here - one reason I wrote this post myself - and each have their own ways they want to push the project. Obviously(!) the answer is both! We want to be able to use the simple interface, whilst also being able to use custom code, and the most important bit is to have them all work together. Which seems the gist of the approach you're taking. But I don't think over-simplifying the 'custom' interface is gaining much particularly since you already have an even simpler frontend.

Of course, we all want more than that too. e.g. how about the possibility of using only JavaFX as the OpenGL/DX integration mechanism so one avoids the opengl stuff entirely ... now that would be nice ... (obviously javafx currently has no way to access the texture maps/etc, but it's in the bug system).

PS i've joined the mailing list, as you will no doubt see shortly :)

PPS yeah i know pointers aren't completely same as references, but I still think of all of them as
"move.l field(a0), d0" in my head (680x0 assembly language) so the differences are minor ...

Gary Frost said...

I added a couple of annotations to support some concerns.

First @OpenCL.Source on the interface to provide source as one complete String.

Also @OpenCL.Resource on the interface to reference a file/url containing the source for the whole interface.

Maybe this will clean it up a little. Take a look and let me know...

NotZed said...

Ahh yeah that looks better. The more I thought about it the more I thought tool support is the real biggy. Also that adding some extra annotations weren't really going to bust the api in any ugly ways either. I still think it'd be useful to have the same annotations on individual kernels as well and when doing so remove the auto-generated prototypes - sometimes you have code that's too big to manage in a single file, and a single kernel might be separated, but still have multiple functions.

The dynamic source idea seems to break encapsulation though, wouldn't either a well-known-method-signature, or a factory class/method specifying attribute help keep it contained? (obviously a public factory method could just wrap it, but the interface is on Device).

NotZed said...

I posted this to the list but it either only went to you or got swallowed by the googlenets: but basically, how would the following code map to aparapi+extension, would it (or could it?) be any easier and clearner to use than JOCL in a case like this:
(note how little of the actual application code is dealing with jocl, even with image stuff, albeit with a pretty messy helper function which hides the details).