svn commit: r344118 - head/sys/i386/include

Fri Feb 15 13:27:30 UTC 2019

On Fri, 15 Feb 2019, Konstantin Belousov wrote:

> On Fri, Feb 15, 2019 at 07:16:04AM +0000, Alexey Dokuchaev wrote:
>> On Thu, Feb 14, 2019 at 01:53:11PM +0000, Konstantin Belousov wrote:
>>> New Revision: 344118
>>> URL: https://svnweb.freebsd.org/changeset/base/344118
>>>
>>> Log:
>>>   Provide userspace versions of do_cpuid() and cpuid_count() on i386.
>>>
>>>   Some older compilers, when generating PIC code, cannot handle inline
>>>   asm that clobbers %ebx (because %ebx is used as the GOT offset
>>>   register).  Userspace versions avoid clobbering %ebx by saving it to
>>>   stack before executing the CPUID instruction.
>>>
>>> ...
>>> +static __inline void
>>> +do_cpuid(u_int ax, u_int *p)
>>> +{
>>> +	__asm __volatile(
>>> +	    "pushl\t%%ebx\n\t"
>>> +	    "cpuid\n\t"
>>> +	    "movl\t%%ebx,%1\n\t"
>>> +	    "popl\t%%ebx"
>>
>> Is there a reason to prefer pushl+movl+popl instead of movl+xchgl?
>>
>>     "movl %%ebx, %1\n\t"
>>     "cpuid\n\t"
>>     "xchgl %%ebx, %1"
>
> xchgl seems to be slower even in registers format (where no implicit
> lock is used).  If you can demonstrate that your fragment is better in
> some microbenchmark, I can change it.  But also note that its use is not
> on the critical path.

The should have the same speed on modern x86.  xchgl %reg1,%reg2 is
not slow, but it changes 2 visible registers and a needs somwhere to
hold one of the registers while changing it, so on 14 year old AthlonXP
where I know the times in cycles better, register xchgl was twice as slow
as register move (2 cycles latency instead of 1, and throughput ==
latency (?)).  On 2015 Haswell, register movl in a loop is in parallel
with the loop overhead (1 cycle), while xchgl and pushl/popl take 0.5
cycles longer on average.  Latency might be a problem for pushl/popl
in critical paths.  There aren't many of those.

There is no reason to use the style with strings made unreadable using
soft tabs and newlines.  gcc supported hard newlines 20-30 years ago,
but broke this because C90 or C99 made hard newlines in strings invalid.
This broke lots of my asms.  I now use hard tabs and backslash-hard_newlines
after soft newlines:

 	__asm __volatile("	\n\
 	pushl	%%ebx		\n\
 	cpuid			\n\
 	movl	%%ebx,%1	\n\
 	popl	%%ebx"		\n\
 	");

The Standard C lossage forces use \n\ before hard newline, and readability
forces a hard-to-edit variable number of hard tabs before \n\, but otherwise
the code looks the same as before (opcodes are outdented to column 8 in
large asms, and labels are outdented to column 0, so that the code looks
the same as non-inline asm too).

Bruce