data
To get my bearings I created a full memory map of the entire epiphany 16-core device. Below is an excerpt from the lowest addresses on each core as I had it initially. Yeah ... didn't really feel like watching tv.
+----------+---- local ram start for each core | 0000 | sync | 0004 | swe | 0008 | memfault | 000c | timer0 | 0010 | timer1 | 0014 | message | 0018 | dma0 | 001c | dma1 | 0020 | wand | 0024 | swi +----------+------ | 0028 | extmem | 002c | group_id | 0030 | group_rows | 0034 | group_cols | 0038 | core_row | 003c | core_col | 0040 | group_size | 0044 | core_index +----------+------ | 0048 | | 004c | | 0050 | | 0054 | +----------+------ | 0058 | .text initial code | |
The base load address matched the current sdk because I was originally compatible with e-lib.
I tried a couple of ideas but settled on leaving room for 8 words (easy indexing) and found a good use for every slot.
| | +----------+------ | 0048 | argv0 | 004c | argv1 | 0050 | argv2 | 0054 | argv3 | 0058 | entry | 005c | sp | 0060 | imask | 0064 | exit code +----------+------ | 0068 | .text.init | |
Being able to load r0-r3 and set the stack pointer from the host allows any argument list to be created - i.e. the 'main' routine takes native types as arguments rather than nothing at all. This can help simplify some startup issues because you don't need to synchronise with anything to get the data on startup and for simple cases might be all you need.
Being able to set the entry point allows multiple 'kernels' to be stored in the same binary and be invoked without having to reload the code. It also 'fixes' the problem of having to trampoline to a 32-bit address via the startup routine if you want an off-core main (which may have limited uses).
And the imask just allows some initial system state to be configured. There may need to be more.
And finally the exit code allows the kernel to return some value. Actually i'm not sure how useful it is but there was an empty slot and it was easy to add.
code
So to the reset interrupt handler. This is just a rough sketch of where my thoughts are at the moment and i haven't tried it on the machine so it may contain some silly mistakes or other thinkos.
.section .text.ivt0 _ivt_sync: b.l _isr_sync ;sync
Currently in ez-loader the section name .text.ivt0 sets the interrupt vector although because it can create the ivt entries on the fly it may be something that can be removed. Or have it work as an override in which case all the following code is never used. The compiler outputs various sections for interrupt handlers too and I intend to remain compatible with those even though interrupt handlers in c can be a bit nasty due to the size of the register file and lack of multi push/pull instructions.
.section .text.init _isr_sync: ;; load initial state mov r7,#tcb_off ldrd r0,[r7,#tcb_argv0_1] ; argv0, argv1 ldrd r2,[r7,#tcb_argv2_3] ; argv2, argv3 ldrd r12,[r7,#tcb_entry_sp] ; entry, sp ldr r4,[r7,#tcb_imask] ; imask
Then we come to the meat and potatoes. First the state is loaded from the 'task control block'. Double-word loads are used where possible and most values are loaded directly into their final desired location.
;; set frame pointer mov fp,sp ;; set imask movts imask,r4
A couple of values then need to be set by hand. There may need to be other init work done here such as clearing ipend.
;; set entry point movts iret,r12
Rather than drop down to user-mode to run setup routines as required by a full c runtime this just jumps straight to the kernel entry point via iret. This saves the need to resolve the pre-main 'start' trampoline routine too.
;; link to exit routine mov lr,%low(_exit)
Similarly because main isn't being called via a user-mode stub using bl or jalr the link register needs to be set manually. This can go also directly to _exit without passing go. Because this is intended to be on-core it only needs to load the bottom 16 bits of the address.
;; launch main directly rti
And after that ... there's nothing more to do. rti will clear the interrupt state, enable interrupts and jump to the chosen kernel entry point. r0-r3 and the stack will contain any args required, and when it finishes it will return from the function ...
_exit: mov r7,#tcb_off str r0,[r7,#tcb_exit_code] 1: idle b.s 1b .size _exit, .-_exit
... and end up directly at exit. This can save out the return code from the function and then 'shuts down' via an idle and repeat loop. And at this point new code can (relatively) safely be loaded, or just the entry and arguments changed to relaunch the code with a sync signal. It may need another field for to indicate the core state such as running or exited and that might be more useful than having an exit code.
The nice thing about the dynamic loader I have is that I can separate this startup mechanism from the code completely - with a bit more work it doesn't even need to be linked to it. Because code is relocated at runtime I can change the startup code or tcb structure without forcing a recompile and only the workgroup config stuff needs to remain fixed.
For example another option may be to have a job dispatch loop replace most of the above and avoid the need to have to add it externally. It could even potentially be loading in the code itself from asynchronously en-queued jobs like hsa does. Possibly even using the hsa dispatch packet or some subset of it. Hmm, thoughts.
No comments:
Post a Comment