I'd already skimmed some of it and had a general understanding but I picked up a few more details.
The queuing mechanism looks pretty nice - very simple yet able to do everything one needs in a multi-core system. I had been under the impression that the queue system had / needed some hardware support on the CPU too but looking at it but it doesn't look necessary so was just a misunderstanding. Or ... maybe it's not since any work signalling mechanism would ideally avoid kernel interactions and/or busy waiting - either or both of which would be required for a purely software implementation. But maybe it's simpler than that
To be honest i would have preferred that hsail was a proper assembly language rather than wrapping the meta-data in a pseudo-C++ syntax. And brig seems a bit unnecessarily on the bulky side for what is essentially a machine code encoding. At the end of the day neither are deal breakers.
The language itself is kind of interesting. Again I thought it was a slightly higher level virtual-processor that it is, something like llvm's intermediate representation or PTX. But it has a fixed maximum number of registers and the register assignment and optimisation occurs at the compiler stage and not in the finaliser - looks a lot more like say DEX than IR or Java bytecode. This makes a lot of sense unless you have a wildly different programming model. Seems a pretty reasonable and pragmatic approach to a universal machine code for modern processors.
The programming and queuing model looks like something that should fit into Epiphany reasonably well. And something that can be used to implement OpenCL with little work (beyond the compiler, but there's few of those already).
GCC
I managed to get gcc checked out to build. The hsa tools page just points to the subversion branch with no context at all ... but after literally 8 hours trying to check it out and only being part-way through the fucking C++ standard library test suite, I gave up (I detest git but I'm no fan of subversion by any stretch of the imagination). I had to resort to the git mirror. Unfortunately gcc takes a lot longer to build than last time I had to despite having faster hardware, but that's 'progress' for you (no it's not).
I'm not sure how useful it is to me as it just generates brig directly (actually a mash up of elf with amd64 + brig) and there's no binutils to play with hsail that I can tell. But i'll document the steps I used here.
git clone --depth 1 -b hsa git://gcc.gnu.org/git/gcc.git mkdir build cd build ../gcc/configure --disable-bootstrap --enable-languages=c,c++ --disable-multilib make
Slackware 64 is only 64-bit so I had to disable multilib support.
The example from gcc/README.hsa
can then be compiled using:
cd .. mkdir demo cd demo cat > hsakernel.c extern void square (int *ip, int *rp) __attribute__((hsa, noinline)); void __attribute__((hsa, noinline)) square (int *in, int *out) { int i = *in; *out = i * i; } CTRL-D ../build/gcc/xgcc -m32 -B../build/gcc -c hsakernel.c -save-temps -fdump-tree-hsagen
Using -fdump-tree-hsagen outputs a dump of the raw HSAIL instructions generated.
[...] ------- After register allocation: ------- HSAIL IL for square BB 0: ld_kernarg_u32 $s0, [%ip] ld_kernarg_u32 $s1, [%rp] Fall-through to BB 1 BB 1: ld_s32 $s2, [$s0] mul_s32 $s3, $s2, $s2 st_s32 $s3, [$s1] ret_none [...]
Went through the gcc source and found a couple of useful bits. To get the global work-id I found you can use: __builtin_omp_get_thread_num()
which compiles into workitemabsid_u32 ret,0
. And __builtin_omp_get_num_threads() which compiles into gridsize_u32 ret,0
. Both only work on dimension 0. And that seems to be about it for work-group functions.
I'm not really sure how useful it is and unless the git mirror is out of sync there hasn't been a commit for a few months so it's hard to know it's future - but it's there anyway.
My understanding is that a reference implementation of a finaliser will be released at some point which will make BRIG a bit more interesting (writing one myself, e.g. for epiphany, is a bigger task than i'm interested in right now). I'm probably going to have more of a look at aparapi and the other java stuff for the time being but eventually get the llvm based tools built as well. But ugh ... CMake.
No comments:
Post a Comment