slight problem with r254457 (kthread_add fix)

Konstantin Belousov kostikbel at gmail.com
Thu Mar 27 12:07:31 UTC 2014


On Thu, Mar 27, 2014 at 03:02:49AM -0600, Chris Torek wrote:
> This fix:
> 
>  http://lists.freebsd.org/pipermail/svn-src-head/2013-August/050550.html
> 
> has introduced a bit of a problem with kernel threads that enable
> the FPU, on amd64.
> 
> Before the fix, a new thread added to proc0 (when kthread_add() is
> called with p == NULL) was essentially "forked from" thread0:
> 
> 	if (p == NULL) {
> 		p = &proc0;
> 		oldtd = &thread0;
> 	}
> 
> Now, each such new thread "forks from" the *newest* thread in proc0:
> 
> 	if (p == NULL)
> 		p = &proc0;
> 	...
> 	PROC_LOCK(p);
> 	oldtd = FIRST_THREAD_IN_PROC(p);
> 
> The problem is that "first thread in proc" is really the *last*
> (i.e., most recent) thread in the proc, as thread_link() does a
> TAILQ_INSERT_HEAD() on p->p_threads, and FIRST_THREAD_IN_PROC
> takes the TAILQ_FIRST element.
> 
> What this means is that if something enables the FPU (e.g., a
> device creates a taskq thread and enables the FPU in that thread,
> so that the device can use XMM registers), every subsequent taskq
> also has the FPU enabled.  (We found this by unconditionally
> enabling the extensions in various threads, assuming that new ones
> did not have FPU enabled for kernel use yet.  That worked until
> we picked up this particular change.)
> 
> The fix itself is fine but "forking from" the most-recent kernel
> thread, rather than thread0, for this case seems wrong.  I see
> three obvious ways to fix *that*:
> 
>  - Restore the old behavior:
> 
>  	if (p == NULL) {
> 		p = &proc0;
> 		oldtd = &thread0;
> 	} else
> 		oldtd = NULL;
> 	...
> 	PROC_LOCK(p);
> 	if (oldtd == NULL)
> 		oldtd = FIRST_THREAD_IN_PROC(p);
> 
>    Conservative, will definitely work, but still seems a bit odd
>    behavior-wise: a new "not in a proc" kthread clones from thread0,
>    but a new "is in a proc" kthread clones from most-recent-thread
>    in that kernel proc.  (But that's what this did before the fix,
>    modulo the bug where it cloned from a dead thread...)
> 
>  - Pick the *last* (i.e., earliest still-alive) thread in the proc:
> 
> 	PROC_LOCK(p);
>  	oldtd = LAST_THREAD_IN_PROC(p);
> 
>    where LAST_THREAD_IN_PROC uses the tail entry on the tailq
>    (pretty straightforward).
> 
>  - Get rid of FIRST_THREAD_IN_PROC entirely, as it's not clear
>    what "first" means (at least it was not to me: I assumed at
>    first that it meant the *oldest*, i.e., first-created, one):
>    add new macro accessors NEWEST_THREAD_IN_PROC,
>    OLDEST_THREAD_IN_PROC, and (for when you don't care about
>    order) SOME_THREAD_IN_PROC [%], and start using those where
>    appropriate.  The thread_link() routine then puts "newest" and
>    "oldest" at whichever end of the tailq is best for kernel code
>    space/speed, and only it and the macros need to agree.
> 
>    [% Needs better name.  Perhaps just THREAD_IN_PROC?]
> 
> I actually like the last option best, but it is the largest
> overall change and it messes with all kinds of other kernel code
> (there are 62 occurrences of FIRST_THREAD_IN_PROC in a count I
> just ran).  However, it becomes clearer which thread is really
> wanted.
> 
> (I'm going to do the first fix, for now at least, in our kernel.)

I agree with the problem statement, but disagree with the proposed
'fix'. I.e., I agree that on the creation of a new kernel thread, the
current thread FPU grab state must not leak into the created thread.

In fact, cpu_set_upcall() already almost handles this correctly, by
resetting the pcb_save pointer back to the user save area.  What is
not done is the clearing of PCB_KERNFPU flag, which seems to be the
issue you met.  Am I right ?

If yes, then the following patch should handle the problem, hopefully.

diff --git a/sys/amd64/amd64/vm_machdep.c b/sys/amd64/amd64/vm_machdep.c
index 335a9c9..bc7b311 100644
--- a/sys/amd64/amd64/vm_machdep.c
+++ b/sys/amd64/amd64/vm_machdep.c
@@ -446,7 +446,8 @@ cpu_set_upcall(struct thread *td, struct thread *td0)
 	 * values here.
 	 */
 	bcopy(td0->td_pcb, pcb2, sizeof(*pcb2));
-	clear_pcb_flags(pcb2, PCB_FPUINITDONE | PCB_USERFPUINITDONE);
+	clear_pcb_flags(pcb2, PCB_FPUINITDONE | PCB_USERFPUINITDONE |
+	    PCB_KERNFPU);
 	pcb2->pcb_save = get_pcb_user_save_pcb(pcb2);
 	bcopy(get_pcb_user_save_td(td0), pcb2->pcb_save,
 	    cpu_max_ext_state_size);
diff --git a/sys/i386/i386/vm_machdep.c b/sys/i386/i386/vm_machdep.c
index ba61bdf..3562954 100644
--- a/sys/i386/i386/vm_machdep.c
+++ b/sys/i386/i386/vm_machdep.c
@@ -457,7 +457,8 @@ cpu_set_upcall(struct thread *td, struct thread *td0)
 	 * values here.
 	 */
 	bcopy(td0->td_pcb, pcb2, sizeof(*pcb2));
-	pcb2->pcb_flags &= ~(PCB_NPXINITDONE | PCB_NPXUSERINITDONE);
+	pcb2->pcb_flags &= ~(PCB_NPXINITDONE | PCB_NPXUSERINITDONE |
+	    PCB_KERNNPX);
 	pcb2->pcb_save = &pcb2->pcb_user_save;
 
 	/*
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20140327/86b9881a/attachment.sig>


More information about the freebsd-hackers mailing list