KSE/ia64: a quick update

Wed Aug 6 11:55:32 PDT 2003

Gang,

Given the panics that Daniel is having on pluto1 it's probably a
good idea to fill people in on the status of KSE/ia64:

The groundwork is finished. I practical terms this means that I/O
bound threads work to its fullest extend. There's just one tiny
little annoying and complicated thing: thread_userret(), and
consequently thread_export_context() can be called for interrupts,
traps and faults as well. Since syscalls are not implemented as
traps, we have two distinct paths into (and out from) the kernel.

One (the syscall) is synchronous WRT to program execution and the
other (interrupts) is asynchronous. Synchronous contexts don't
have scratch registers in them. Asynchronous context need to have
them. This is not the hard problem: just add some flags to indicate
what parts of the context are valid and thus should be restored
and we're ok.

The problem is when we preempt an interrupted thread, export the
context to the UTS and do an upcall. We end up having an async.
context in userland. I'm not sure at this time what we should do
with it. We have the following options:

o  Extend _ia64_restore_context() so that libkse can restore async
   contexts. The downside is that it will very likely cause a
   disabled high FP trap, which results in the process having the
   high FP registers enabled. A performance hit. (see also below)
o  Have _ia64_restore_context() call setcontext() for async contexts
   and do the work in the kernel. Restoring the high FP will not
   result in the enablement of the high FP registers, because we
   can restore them to the PCB. They will be loaded into the CPU
   when there's a need for them (which may be never).

Both cases have the problem that we're using a synchronous method
(the call/ret sequence) to restore an async context. I'm not sure
how ugly it gets to change the return path and mimic an interrupt
return.

In short: The KSE framework works, as long as we don't preempt
threads. I'm not sure how to solve that exactly...

About the high FP:

On ia64 the FP registers are split in two (2) sets: low and high.
Both sets can be enabled and disabled independently from each other
and each set has a modified bit to keep track of usage. The low
FP registers are f0-f31 and are always enabled. The high FP registers
are f32-f127 and are disabled by default. We use lazy context
switching to save and restore these on a need to have basis. When
a process uses a high FP register and the set is disabled, we take
a trap, save the high FP registers currently on the CPU and load
the high FP registers of the process that trapped. We then enable
the high FP registers (for that process) and let it continue. As
long as there's only 1 process using the high FP registers, there's
no performance penalty when they are enabled.
Note that compilers will avoid using the high FP registers as much
as possible. AFAICT, none of the code in our source tree uses the
high FP registers. Hence: they should only be enabled when the
process is highly FP intensive. 

FYI,

-- 
 Marcel Moolenaar	  USPA: A-39004		 marcel at xcllnt.net