Monday, 1 March 2010

Context switching

After working on it in bits and pieces over the last few days I managed to get context switching working (late) last night.

Again apart from one little error it might've happened a lot sooner - I read the APSR description about the 'big endian bit' and I think my mind just assumed 'not x86 ==> big endian' and I decided I needed to turn it on. Oops. Very odd, apparently scheduling tasks just fine but they don't seem to execute at all, nor cause any crashes. Oh this is the first time i've had user-mode code executing as well.

My initial idea was to only save the minimal state possible on interrupt entry, and only bother with a full save/restore in the event of an actual context switch. But then thinking about the microkernel design I want to implement later on, this seems a bit pointless since generally the `low' part of the interrupt handler will always run straight away so it will probably lead to a context switch. This simplifies the context switcher a little bit and adds a bit more flexibility internally too. For system calls i'm not so sure yet - quite a few system calls should lead to a context switch too but a lot wont, so i'm still thinking about whether it will work the same way. And I have to decide if system-calls can be interrupted - if they remain very tiny then they wont need to be.

The context switch code basically ends up the same as the one out of the manual, but hooked into the normal IRQ handler. The IRQ stack pointer is used as the point to ThisTask.tcb.regs[0] all the time, so nothing needs to be loaded at interrupt handler time. I have a trivial ASM function which lets the system code set the next task to run:
        @ r0 = pointer to r0 in tcb
.global irq_new_task
irq_new_task:
mrs r1,cpsr
cps #MODE_IRQ
mov sp,r0
msr cpsr,r1
bx lr

The function below then becomes the new IRQ entry point. The lines highlighted in bold indicate the changes from the previous interrupt handler - there aren't too many, and one is just a cleanup (the mov lr,pc). The main difference is that all of the context state is stored on the `irq stack' rather than the supervisor stack. But it isn't really a stack, it's just a pointer to the TCB. This does have one consequence - it is impossible to implement re-entrant interrupts (at least without further code). If more state is required it may make sense to shift all interrupts to use fast interrupts, since it could then use the other FIQ banked registers to store per-processor state - although it could equally just be a pointer from the TCB to a per-processor or per-process struct.
        .global irq_entry
irq_entry:
sub lr,#4
stm sp,{ r0-r14 }^ @ save all user regs
srsdb #MODE_IRQ @ save spsr and return pc

cps #MODE_SUPERVISOR
push { r12, lr } @ save supervisor lr and r12 to supervisor stack

ldr r5,=INTCPS_BASE @ find active interrupt from INTCPS
ldr r0,[r5,#0x40]

ldr r2,=irq_vectors @ execute vectored handler
and r0,r0,#0x7f
mov lr,pc
ldr pc, [r2, r0, lsl #2]

mov r1,#1 @ tell INTCPS we've handled int
str r1,[r5,#INTCPS_CONTROL]
dsb

pop { r12, lr } @ last of state on supervisor stack

cps #MODE_IRQ

ldm sp,{r0-r14}^
rfedb sp @ back to new or old task

Now the IRQ sp is just used as a pointer to the current TCB - so the code doesn't perform any write-back to the sp when writing or restoring values. The user registers are stored/restored above it, and the pc and spsr registers are stored below it. The push { r12, lr } doesn't actually need to store r12 since we already saved it, but this is used to keep the stack aligned at this point - it will need to change to ensure the alignment specifically so r12 wont need to be saved once that is done.
struct tcb {
uint32_t pc;
uint32_t spsr;
// <- sp_irq points here always
uint32_t regs[15];
};

The C structure that maps to this is shown above, indicating where the sp_irq actually points. I have a simple linked-list of tasks to hold this state, and whatever other state the kernel might need.
struct task {
struct Node Node;
int id;

struct tcb tcb;
};

Within an interrupt routine, if I wish to schedule a new task all I need to do is call the aforementioned irq_new_task function with the value from inside the tcb, and that task will run once the interrupt is finished.

Although not particularly practical in the real world, a round-robin scheduler of all tasks in the run-queue is as simple as:
        Remove(&thistask->Node);
AddTail(&tasks, &thistask->Node);

thistask = (struct task *)tasks.Head;
irq_new_task(&thistask->tcb.regs[0]);

There are (at least?) two other pieces of context that also need changing in a complete system, the MMU tables and the floating point registers.

Floating point registers don't need saving/restoring here because the kernel will never use floating point itself. And instead of wasting the time saving/restoring full FPU state at every interrupt, the code will just find out when a floating point instruction is used and swap the state then. The irq_new_task call can probably just disable the floating point unit, and when an undefined instruction interrupt occurs on the floating point co-processor the state can be changed then if it needs to be. And/or some combination there-of. e.g. irq_new_task could check which task `owns' the FPU and enable/disable it appropriately.

The MMU is a little trickier, it will need the page tables changed at every context switch. Since I will map the system memory globally across all tasks, I will probably also be able to put all of that logic into irq_new_task as well. Not sure how i'll deal with system calls that cause a context switch yet. I am looking at a process+task model too, so each task will be associated with a parent process which will encompass the memory map, so for example, a mutli-tasked process wont need an MMU switch if it is only switching between tasks.

And finally the last piece of the puzzle is boot-strapping the task system. There is only 1 CPU and it can only execute one bit of code at a time, so basically the initial booting process just 'falls through' to the 'current task' and automagically just starts executing as part of it's context. There is actually very little that needs to be set up - simply changing to user mode, and then initialising it's stack pointer, and that's it. The code then jumps to what is the entry point of the current task (as defined by the context switching mechanism above).
        // <- in supervisor state here, with boot-strap stack, etc
asm volatile("cps #0x10");
// <- user mode, undefined stack pointer
asm volatile("ldr sp,=0x88000000 - 32768");
// <- now sp is set to tasks's stack
task(0);

The code above is just a demo, but the final thing will just jump to the idle task, so it can still be hard-coded in a similar way. Or it can be even simpler - the idle task does not need any stack, e.g. the following would suffice.
idle_task:
wfi
b idle_task

Code not committed yet.

Hmm, not sure what to look at next, maybe MMU context switching, or perhaps something simpler like system calls. Really exhausted tonight - spent the whole day staring at code not written by a programmer, and I need a good meal too.

Maiden Century

This is my first web diary that's made it to 100 posts, that being this post. Yay.

3 comments:

Sankar said...

Congrats on the 100th post. I wish you reach more milestones and write on more topics apart from board-hacking (somehow it doesn't feel like I've read a 100 posts here)

NotZed said...

I guess i write on what i'm doing at the time, but what sort of things do you mean?

Sankar said...

I like reading all your posts - gardening, construction, cycling, atheism, science etc.

Since I am not so interested in the cell/beagleboard-coding part, I just found the posts on them to be too high in number :-) May be just a flawed perception.

Afterall, it is your blog and you write what you like, but as an audience I just wish you post more on rest of the topics as well - like more on gardening, cycling or some FOSS projects (pulseaudio, camel, etc.) that I can relate to, etc. Have Fun :-)