But then I got a bit side-tracked ...
First I thought i'd see if i could replace crt0 with my own. Apart from an initialisation issue with bss and data there isn't really anything wrong with the bundled one ... apart from it dragging in a bunch of C support stuff which isn't necessary if writing small kernel code that doesn't need the full libc.
So I took the crt0.s and stripped out a bunch of stuff. The trampoline to a potentially off-core start routine (it's going to be on-core). The atexit init (not necessary). The constructors init (hopefully not necessary). The .bss clearing (it's problematic when you have possibly more than one block of bss such as shared and given that .data isn't re-initialised anyway the C language behaviour is already broken). And the argument setting (three zero's isn't useful for anything and isn't even correct).
I toyed with the idea of passing arguments to kernel but decided to just have a void main instead.
I just pass
-nostartfiles e-crt0minimal.o to the link line to replace the standard start-up code.
e-gcc -Wl,-r -o e-test-reloc.elf -nostartfiles e-crt0minimal.o e-test-reloc.o -le-lib
Worth it? Probably ...
Minimal crt0: $ e-size e-test-workgroup-a.elf text data bss dec hex filename 548 64 120 732 2dc e-test-workgroup-a.elf Standard crt0: $ e-size e-test-workgroup-a.elf text data bss dec hex filename 1418 1196 128 2742 ab6 e-test-workgroup-a.elf
When you've only got 32K to play with saving 2K isn't to be sniffed at.
Matching addresses (aka fake hsa)
Next I looked at just using mmap to map the shared memory so pointers can be shared between the epiphany and the host arm directly. I tried to use e_get_platform_info() to get the list of memory blocks but for some odd reason that zeros out the memory array pointer? Odd. So ... I just access the struct directly via an extern instead.
This is just an implementation of stuff from a previous post but using the platform_info to find the addresses.
I have no idea whether this will work on a multi-epiphany setup but since I don't have one it's not something i'll lose sleep over :)
About this point I noticed that some symbols weren't being allocated any location in the output file and thus could not be resolved during the loader-linker execution. These were symbols marked with the section id of COMMON. I had hit this before but I had forgotten all about it. Last time I solved it using a linker script but I found I can just pass '-d' to the linker to achieve the same result which puts any such values into bss.
Automatic remote-core on-chip symbol resolution
Then I had a look into implementing fully automatic resolution of remote but on-chip symbols. The options are limited but the desired target core can be indicated by the symbol name.
program a: extern int buffer; program b: // current cell relative +0, +1 extern int buffer_0_1; // group relative 0,1 extern int buffer$0$1;
Or some variation thereof. This is easy enough to parse and implement, and not too ugly to use.
But it would mean that the binary for every core would need to be linked individually and thus it wouldn't be possible to just copy the same code across to multiple cores when they share the implementation. For this reason I've dropped the idea for now. Having to use e_get_global_address() on a weak symbol isn't too difficult.