svn commit: r216634 - in head/sys/amd64: amd64 ia32 include linux32

Wed Dec 22 18:02:28 UTC 2010

On Wed, 22 Dec 2010, Jung-uk Kim wrote:

> On Wednesday 22 December 2010 08:03 am, John Baldwin wrote:
>> On Tuesday, December 21, 2010 7:18:42 pm Jung-uk Kim wrote:
>>> Log:
>>>   Improve PCB flags handling and make it more robust.  Add two
>>> new functions for manipulating pcb_flags.  These inline functions
>>> are very similar to atomic_set_char(9) and atomic_clear_char(9)
>>> but without unnecessary LOCK prefix for SMP.  Add comments about
>>> the rationale[1].  Use these functions wherever possible.
>>> Although there are some places where it is not strictly necessary
>>> (e.g., a PCB is copied to create a new PCB), it is done across
>>> the board for sake of consistency.  Turn pcb_full_iret into a PCB
>>> flag as it is safe now.  Move rarely used fields before pcb_flags
>>> and reduce size of pcb_flags to one byte.  Fix some style(9) nits
>>> in pcb.h while I am in the neighborhood.
>>>
>>>   Reviewed by:	kib
>>>   Submitted by:	kib[1]
>>>   MFC after:	2 months
>>
>> Is there really a need to have the flags field be a char instead of
>> an int or long?  It seems to me that flags fields in general should
>> be an int unless there is a strong need otherwise (e.g.
>> hardware-defined flag).  'orl' will work just as well as 'orb'.
>
> Actually, there was little disagreement between kib and me and
> committed version was fifth reincarnation of original patch.  I

I would have a big disagreement :-).  I don't like using macros or
functions to obfuscate setting and clearing of boolean flags.  Here
the function may be needed for locking, but...  I already asked kib
not to use a function for (partially) managing 2 FPU flags (there
should be only one of these and then the management becomes simple
boolean again).

Putting pcb_full_iret in with the other flags gives a tiny pessimization
-- it now takes a load-modify-store to set it, where it only took a store.
I think this and any other flags that need special management should be
in a separate struct member (possibly in a separate cache line) like
pcb_full_iret was.  This corresponds to td_flags being separate from
td_pflags.  Accesses to these td flags are not obfuscated.

Many  of the accesses still aren't atomic unless there is higher-level
locking, since they are of the form { set one flag; clear another; }.
It's unclear if there is any higher-level locking for the FPU flags,
whose management has become quite complicated:
- fpudna(), fpudrop(): everything is locked by a critical section.  This
   was needed only for the fpucurthread switch.  It is unclear if it is
   now also needed for the flags management.  Here the flags must be managed
   inside the critical section due to the ordering.
- fpugetregs(): does all flags managament outside of its critical sections
- fpuuserinited(): in critical section iff caller provided one (same for
   fpudrop(), but it asserts that the caller provided one)
- fpusetregs(): carefully exits from its critical section before managing
   flags.  Now the ordering permits this.
- fpu_kern_enter(): no critical sections in sight (but when it abuses
   fpuexit() to enter, fpuexit() will use one.
- fpuexit(): doesn't manage flags.  Probably not needed, especially since
   it is never used to exit (thread exit abuses fpudrop() instead), and
   when it is used to enter as above, its caller manages the flags.
- fpu_kern_leave(): carefully exits from its critical section before managing
   flags.  This is the main fpu place that has to set one flag and clear
   another.
- fpu_kern_thread(): no critical section in sight for flags management.
- is_fpu_kern_thread(): only reads flags.  Since there is no locking in
   sight, the flags are presumably non-volatile.  The one in td_pflags
   certainly is, as is made clear by documented locking for td_pflags.
   The one in pcb_flags is presumably the same, so it very naturally
   belongs in a pcb analog of td_pflags.

> originally used 'u_int pcb_flags' but used byte byte opcodes for
> assembly to make it consistent with compiler generated code.  kib

Unfortuneatly its easier for the compiler to maintain the optimization
of doing byte accesses if that is good.

> disliked the inconsistencies and told me to reconsider.  After
> several versions, I proposed we use char and move forward for now,
> then we increase the size if it ever grows more than 8 bits.  So, we
> settled.  I have no problem with int version if there is no
> measurable performance change.  I'll test it soon.

I've never seen any performance loss from using 32-bit accesses, except
with the original i386 since it had no cache and a worse instruction
fetcher so that even the extra length of 32-bit immediate operands 
slowed it down.  With not-so-old CPUs it is the 8-bit accesses that
are more likely to be slower, since they may be misaligned and/or
cause store-to-load mismatch penalties.

There are a couple of (read-only) accesses to td_flags and td_pflags in
asm code very need the pcb_full_iret ones.  The former use simple testl's.

Bruce