fpudna: fpcurthread == curthread 2 times

Mon Apr 11 14:39:49 PDT 2005

On Sun, 10 Apr 2005, Kris Kennaway wrote:

> Got this on an SMP package machine running 5.4-RC2:
>
> fpudna: fpcurthread == curthread 1 times
> fpudna: fpcurthread == curthread 2 times
>
> int
> fpudna()
> {
>        struct pcb *pcb;
>        register_t s;
>
>        if (PCPU_GET(fpcurthread) == curthread) {
>                printf("fpudna: fpcurthread == curthread %d times\n",
>                    ++err_count);
>                stop_emulating();
>                return (1);
>        }
>
> Debugging message accidentally left in, or bug?

Bug.  (PCPU_GET(fpcurthread) == curthread) "can't happen".  If it happens,
then it should cause a panic.  It used to cause a panic and apparently
never happened until a bug was introduced somewhere in 2002.  Rev.1.131
of sys/i386/isa/npx.c broke the panic and replaced it by the above printf
and wrong error handling.  sys/amd64/amd64/fpu.c inherits this mistake.

I can't see how this can happen on i386's, and thought that the bug
was just in uncommitted versions of code associated with rev.1.131 of
npx.c, but on amd64's there is a bug in savectx() that would cause it
if savectx() is actually called.

savectx() is not called enough.  It is necessary to call it for core
dumps to force the FP (and possibly other) state to memory, but this
has never been done in FreeBSD.  savectx() used to be used mainly for
forks, but that use went away except on ia64's.  ia64's also use
savectx() in cpu_switch() and interrupt().  This leaves only 1 use in
the !SMP case for most arches -- for shutdown.  In the !SMP case,
savectx() is also used for cpustop IPIs, on amd64's and i386's only.
I think cpustop IPIs don't happen in normal operation, but they certainly
happen if kdb is used, and this sometimes breaks the
(PCPU_GET(fpcurthread) != curthread) invariant.

Here is the bug in amd64's savectx():

% ENTRY(savectx)
% 	pushfq
% 	cli
% 	movq	PCPU(FPCURTHREAD),%rax
% 	testq	%rax,%rax
% 	je	1f
%

We get here when PCPU_GET(fpcurthread) != NULL (then it must be curthread,
since we don't implement lazy context switching).

% 	movq	TD_PCB(%rax),%rdi
% 	leaq	PCB_SAVEFPU(%rdi),%rdi
% 	clts

The previous instruction turns off the DNA trap even if it was already
off.  In fact, it must already be off, since the invariant says that a
process owns the FPU iff the trap is off.

% 	fxsave	(%rdi)
% 	smsw	%ax
% 	orb	$CR0_TS,%al
% 	lmsw	%ax

The previous 3 instructions turn on the DNA trap unconditionally.  This
is unnecessary, but if it is done then to preserve the invariant,
ownership of the FPU must be given up by clearing PCPU(FPCURTHREAD)...

% 
% 	movq	$PCB_SAVEFPU_SIZE,%rdx	/* arg 3 */
% 	leaq	PCB_SAVEFPU(%rcx),%rsi	/* arg 2 */
% 	/* arg 1 (%rdi) already loaded */
% 	call	bcopy
% 1:
% 	popfq
% 
% 	ret

... but this is not done anywhere.

The corresponding code for i386's calls npxsave() because the error
handling for the fsave instruction used to far too complicated to
do in asm.  npxsave() doesn't neglect to clear PCPU*(fpucurthread):

% void
% npxsave(addr)
% 	union savefpu *addr;
% {
% 
% 	stop_emulating();
% 	fpusave(addr);
% 	start_emulating();
% 	PCPU_SET(fpcurthread, NULL);
% }

Fix: it should work to just remove the foot-shooting turning on of the
DNA trap and the null turning off of the DNA trap.  Otherwise, clear
PCPU(FPCURTHREAD).

The same simplifications don't quite work for npxsave():
- stop_emulating() is not needed in this context but is in other contexts
- fpusave() isn't a single instruction, so doing it in asm is not so easy
- PCPU*(fpcurthread) must be dropped in the !fxsr case, and then we need
   start_emulating() and PCPU_SET(fpcurthread, NULL).

Here are my old fixes for rev.1.131 of npx.c (back out the bad debugging
bits and also:
- move the intr_disable() to before the invariants test.  We are testing
   and reporting a variable that is only volatile in the "can't happen"
   case, but since we are testing and reporting that case we we had better
   make the variable non-volatile.
- fix nearby style and strict C conformance bugs (use of printf+panic
   instead of just panic, and non-KNF indentation, and %p arg not of type
   void *).

%%%
Index: npx.c
===================================================================
RCS file: /home/ncvs/src/sys/i386/isa/npx.c,v
retrieving revision 1.152
diff -u -2 -r1.152 npx.c
--- npx.c	19 Jun 2004 22:24:16 -0000	1.152
+++ npx.c	15 Oct 2004 15:38:14 -0000
***************
*** 764,770 ****
    * access foreign pcb's.
    */
- 
- static int err_count = 0;
-
   int
   npxdna()
--- 764,767 ----
***************
*** 776,793 ****
   	if (!npx_exists)
   		return (0);
- 	if (PCPU_GET(fpcurthread) == curthread) {
- 		printf("npxdna: fpcurthread == curthread %d times\n",
- 		    ++err_count);
- 		stop_emulating();
- 		return (1);
- 	}
- 	if (PCPU_GET(fpcurthread) != NULL) {
- 		printf("npxdna: fpcurthread = %p (%d), curthread = %p (%d)\n",
- 		       PCPU_GET(fpcurthread),
- 		       PCPU_GET(fpcurthread)->td_proc->p_pid,
- 		       curthread, curthread->td_proc->p_pid);
- 		panic("npxdna");
- 	}
   	s = intr_disable();
   	stop_emulating();
   	/*
--- 773,780 ----
   	if (!npx_exists)
   		return (0);
   	s = intr_disable();
+ 	if (PCPU_GET(fpcurthread) != NULL)
+ 		panic("npxdna: fpcurthread = %p, curthread = %p",
+ 		   (void *)PCPU_GET(fpcurthread), (void *)curthread);
   	stop_emulating();
   	/*
%%%

Other bugs found while investigating this: in npxtrap():

% 	/*
% 	 * Interrupt handling (for another interrupt) may have pushed the
% 	 * state to memory.  Fetch the relevant parts of the state from
% 	 * wherever they are.
% 	 */
% 	if (PCPU_GET(fpcurthread) != curthread) {
% 		control = GET_FPU_CW(curthread);
% 		status = GET_FPU_SW(curthread);
% 	} else {
% 		fnstcw(&control);
% 		fnstsw(&status);
% 	}
% 
% 	if (PCPU_GET(fpcurthread) == curthread)
% 		fnclex();

The fnclex() shouldn't be done (*), but if it is done it should be done
consistently -- exceptions must be cleared in memory if the state has
already been pushed to memory.

(*) The fnclex() is wrong because it discards the exception flags before
anything has had a chance to look at them.  The fnclex() is a hack to
permit certain undefined things like returning from a SIGFPE handler for
a floating point exception (but not for an integer divide exception) to
sort of work.  npxtrap() used to save the exception flags and gdb used
to report them, but both of these have been broken.  Things can be handled
better now that the FP state is saved across signal handlers and signal
handlers get a clean FP state, but this complicates debugging and the
complications are not handled.

Unconditionally clearing exceptions is especially bad on Athlon CPUs
(if not in the fxsr case generally) since fxsave/fxrstor are brok^optimized
to not save the FPU instruction pointers unless there is an unmasked
exception.  This causes the FPU instruction pointers to be garbage in
most cases, with their context leaking across context switches, and
clearing exception pointers in the above causes the FPU instruction
pointers to be garbage even when the optimization wasn't intended to
apply.

Bruce