Syscalls and RSE

Fri Jun 29 04:43:11 UTC 2007

On Jun 27, 2007, at 5:17 AM, Christian Kandeler wrote:

*snip*
> The problem, I think, is not the RNAT on the way out of the  
> syscall, but a
> collection of undefined NaT bits saved in the kernel and loaded  
> later in user
> space.
*snip*
> I've tried to verify my theory, and I believe I have succeeded: If  
> the code in
> epc_syscall is correct, then it should tolerate any value in RNAT  
> after the
> backing store switch, since the value of this register is undefined  
> anyway.
*snip*
> However, when I manually set RNAT to -1 and boot the resulting  
> kernel, the
> system crashes right after entering user space (Illegal Instruction  
> in the sh
> process). I assume this is due to one of the many NaT bits the process
> receives after making a system call.

Ok, I think I see what you mean -- and it's a valid point.

The current epc_syscall path saves ar.rnat and switches the backing
store. It assumes that ar.rnat is not clobbered because we keep
ar.bspstore aligned WRT to NaT collections. As such, it is assumed
that it's safe to flush the dirty stacked registers on the kernel,
whether or not we cross NaT collection points.

What you're saying is that ar.rnat is not preserved and in fact is
undefined. This is what the SDM actually says. So, you argue that
when we return, the NaT collection point that happened on the kernel
stack and which includes the NaT bits of registers of the process
are undefined after we return to the process and the RSE is unwound
to beyond said collection point. You demonstrated this by putting
a random (i.e. -1) value in ar.rnat and observing that the process
crached.

The problem is in the assumption that ar.rnat is still valid after
writing to ar.bspstore. The SDM states that ar.rnat is undefined.
I haven't seen any indication that ar.rnat is anything other than
what it was before, but that's probably because without speculation
it's always 0 anyway :-)
The fact a bogus ar.rnat value affects a process simply means that
we in fact propagate the ar.rnat back to the process after a flush.
This is a good thing!

However, the question is: should we write the saved ar.rnat value
back to ar.rnat after the backing store switch? Doing so will make
the assumption that ar.rnat is preserved after a backing store
switch valid (in a programmatic manner).

Would this address the problem you describe or did I misunderstand?

-- 
Marcel Moolenaar
xcllnt at mac.com