svn commit: r212177 - in head/sys: amd64/include i386/include

Bruce Evans brde at optusnet.com.au
Tue Sep 7 07:50:02 UTC 2010


On Fri, 3 Sep 2010, Roman Divacky wrote:

> Log:
>  Change the parameter passed to the inline assembly to u_short
>  as we are dealing with 16bit segment registers. Change mov
>  to movw.

u_ints were used intentionally to avoid operand size prefixes here
and extra code for promotions in callers.  Segment registers are
always stored as 32 bits in trap frames, pcbs and regs structs, to
avoid the operand size prefixes in pure assembler code for at least
the trap frames and to avoid packing problems.

Why break this?  This was last broken in 2003, in rev.1.131 for i386,
and last backed out soon after.  Then the reason for breaking it was
stated in the log message -- it was to avoid returning garbage in
the top bits.  The garbage, if any, is due to not wasting time to
remove it.  Instead, we always used 32-bit accesses to segment registers
and depended on the CPU supplying harmless garbage.  This depended on
undocumented features and on avoidance of gas bugs in this area when it
was first used.  Some of the gas bugs seem to be still there:

(1) the original i486 manual only documents mov of a segreg to/from memory
     as being 16 bits.  Gas bugs required writing 32-bit mov's to get
     16-bit ones.  Now, the operand size can be left for gas to determine,
     but gas still produces a spurious operand size prefixs for movw
     in 1 case:

%%%
GAS LISTING z.s 			page 1


    1 0000 8C1D0000 	mov	%ds,mem
    1      0000
    2 0006 8C1D0000 	movl	%ds,mem
    2      0000
    3 000c 668C1D00 	movw	%ds,mem        <--- spurious 66
    3      000000
    4 0013 8E1D0000 	mov	mem,%ds
    4      0000
    5 0019 8E1D0000 	movl	mem,%ds
    5      0000
    6 001f 8E1D0000 	movw	mem,%ds
    6      0000
%%%

For loads there is no problem now -- the operand size prefix is never
generated now, and the contents of the upper 16 bits is irrelevant.

There is a problem with stores -- the above spurious 66 has no effect,
at least on an Athlon64, and the upper 16 bits of `mem' are not changed
by the store.  Thus the optimization is broken for the "m" case of rgs()
etc.  I think there is no problem in practice since the "m" case is
never used, except possibly with -O0 (the "r" case is always preferred).

(2) mov of a segreg to a general register is quite different (mov of a
     segreg _from_ a general register is unproblematic, as above, so
     its details are not given here).  32-bit movs change the top bits,
     but the details of this are not documented in the original i486
     manual.  Later manuals give the details, and they are CPU-dependent
     (IIRC, in general newer CPUs (Athlon64 at least) zero the top bits,
     while older CPUs set them to garbage).  Gas bugs in this area seem
     to be fixed:

%%%
GAS LISTING z.s 			page 1


    1 0000 668CD8   	mov	%ds,%ax
    2 0003 668CD8   	movw	%ds,%ax
    3 0006 8CD8     	mov	%ds,%eax
    4 0008 8CD8     	movl	%ds,%eax
%%%

Now the operand size prefix is not spurious, and it is generated as
necessary without an explicit `w' in the opcode -- it forces the CPU
to do extra decoding and Icache work (for the prefix) in order for the
instruction to do less (not touch the top bits of the target registers).
This is the main pessimization in this commit -- we do extra work here
so that we can do even more extra work in callers, to promote the
result to the 32-bit one that is needed for storing in the pcb or
similar.

(3) push of a segreg is like mov of a segreg to a general register.  Again,
     it is important to write sufficiently secure garbage to the top bits.
     FreeBSD has always used pushl for segment registers in the trap frame
     and dependent on this to write something harmless, since the registers
     in the trap frame are copied without cleaning by ptrace() etc.  You
     can also see the "garbage" in the segment registers in the trap frame
     using debuggers, and doing extra work to clean it in rgs() etc. won't
     even affect it there.

Gas didn't have any problems with push of a segreg AFAIR, but my assembler
does.  My assembler doesn't even know that 32-bit movs of segregs exist,
so it mis-assembles "push %ds" as a 16-bit push in both 16-bit and 32-bit
mode.  I used this in the first 386 protected mode version of Minix to
give bad packing for segment registers in the trap frame.

> Modified: head/sys/amd64/include/cpufunc.h
> ==============================================================================
> --- head/sys/amd64/include/cpufunc.h	Fri Sep  3 13:54:02 2010	(r212176)
> +++ head/sys/amd64/include/cpufunc.h	Fri Sep  3 14:25:17 2010	(r212177)
> @@ -421,40 +421,40 @@ invlpg(u_long addr)
> 	__asm __volatile("invlpg %0" : : "m" (*(char *)addr) : "memory");
> }
>
> -static __inline u_int
> +static __inline u_short
> rfs(void)
> {
> -	u_int sel;
> -	__asm __volatile("mov %%fs,%0" : "=rm" (sel));
> +	u_short sel;
> +	__asm __volatile("movw %%fs,%0" : "=rm" (sel));
> 	return (sel);
> }

The change from u_int to u_short is a pessimization, except in fixes the
"m" case -- see above.  A better fix for the "m" case is to remove it.

The change from mov to movw has no effect except to step on the gas bug
to get an extra operand size prefix that has no effect in the "m" case.

> ...
> static __inline void
> -load_ds(u_int sel)
> +load_ds(u_short sel)
> {
> -	__asm __volatile("mov %0,%%ds" : : "rm" (sel));
> +	__asm __volatile("movw %0,%%ds" : : "rm" (sel));
> }

The pessimizations for the load operations are hopefully optimized away
in all cases.  Normally we start with an int in the trap frame.  [Oops,
on amd64 most things in the trap frame are actually long (register_t ==
__int64_t).  This includes tf_cs and tf_ss which are pushed by hardware.
But tf_[defg]s are uint16_t.]  Then we downcast to u_short for calling
load_*s().  [But on amd64 it is only a downcast for ss.]  On i386,
everything in the trap frame is an int and we used to sidecast to u_int;
now we downcast to u_short.  But the downcast doesn't take any instructions,
and the change from mov to movw also has no uffect.

Bruce


More information about the svn-src-head mailing list