Questions about mutex implementation in kern/kern_mutex.c

Fri Sep 17 21:11:23 UTC 2010

On Friday, September 17, 2010 1:42:44 pm Andrey Simonenko wrote:
> On Thu, Sep 16, 2010 at 02:16:05PM -0400, John Baldwin wrote:
> > On Thursday, September 16, 2010 1:33:07 pm Andrey Simonenko wrote:
> > > The mtx_owned(9) macro uses this property, mtx_owned() does not use anything
> > > special to compare the value of m->mtx_lock (volatile) with current thread
> > > pointer, all other functions that update m->mtx_lock of unowned mutex use
> > > compare-and-set instruction.  Also I cannot find anything special in
> > > generated Assembler code for volatile variables (except for ia64 where
> > > acquire loads and release stores are used).
> > 
> > No, mtx_owned() is just not harmed by the races it loses.  You can certainly 
> > read a stale value of mtx_lock in mtx_owned() if some other thread owns the 
> > lock or has just released the lock.  However, we don't care, because in both 
> > of those cases, mtx_owned() returns false.  What does matter is that 
> > mtx_owned() can only return true if we currently hold the mutex.  This works 
> > because 1) the same thread cannot call mtx_unlock() and mtx_owned() at the 
> > same time, and 2) even CPUs that hold writes in store buffers will snoop their 
> > store buffer for local reads on that CPU.  That is, a given CPU will never 
> > read a stale value of a memory word that is "older" than a write it has 
> > performed to that word.
> 
> Looks like I understand the logic why mtx_owned() works correctly when
> mtx_lock is present in CPU cache or is absent in CPU cache.  The mtx_lock
> value definitely can say whether lock is held by the current thread, but
> it cannot say whether it is unowned or is owned by another thread.
> 
> Let me ask another one question about memory barriers and thread migration.
> 
> Let a thread locked a mutex, modified shared data protected by this mutex
> and was migrated from CPU1 to CPU2 (mutex is still locked).  In this scenario
> just migrated thread will not see stale data for a mutex itself (the
> m->mtx_lock value) and for shared data on CPU2 because when it was migrated
> from CPU1 there was at least one unlock call for some another mutex that had
> release semantics and appropriate memory barrier instruction was run
> implicitly or explicitly.  As a result this "rel" memory barrier made all
> modifications from CPU1 visible on another CPUs.  When CPU2 switched to just
> migrated thread there was at least on lock call for some another mutex with
> acquire semantics, so "rel/acq" memory barriers pair works here together.
> (Also I consider case when CPU2 did not work with that mutex, but worked
> with its memory before.  Some thread on CPU2 could allocate some memory,
> worked with it and freed it.  Later the same part of memory was allocated
> by a thread on CPU1 for mutex).
> 
> Is the above written description correct?

Yes.

> > > There are some places in the kernel where a variable is updated in
> > > something like "do { v = value; } while (!atomic_cmpset_int(&value, ...));"
> > > and that variable is not "volatile", but the compiler generates correct
> > > Assembler code.  So "volatile" is not a requirement for all cases.
> > 
> > Hmm, I suspect that many of those places actually do use volatile.  The 
> > various lock cookies (mtx_lock, etc.) are declared volatile in the structure.  
> > Otherwise the compiler would be free to conclude that 'v = value;' is a loop 
> > invariant and move it out of the loop which would break.  Given that, the 
> > construct you referred to does in fact require 'value' to be volatile.
> 
> I checked Assembler code for these functions:
> 
> kern/subr_msgbuf.c:msgbuf_addchar()
> vm/vm_map.c:vmspace_free()

They may happen to accidentally work because atomic_cmpset() clobbers all of
memory, but these should be marked volatile.

Index: vm/vm_map.c
===================================================================

--- vm/vm_map.c	(revision 212801)
+++ vm/vm_map.c	(working copy)
@@ -343,10 +343,7 @@
 	if (vm->vm_refcnt == 0)
 		panic("vmspace_free: attempt to free already freed vmspace");
 
-	do
-		refcnt = vm->vm_refcnt;
-	while (!atomic_cmpset_int(&vm->vm_refcnt, refcnt, refcnt - 1));
-	if (refcnt == 1)
+	if (atomic_fetchadd_int(&vm->vm_refcnt, -1) == 1)
 		vmspace_dofree(vm);
 }
 
Index: vm/vm_map.h
===================================================================
--- vm/vm_map.h	(revision 212801)
+++ vm/vm_map.h	(working copy)
@@ -237,7 +237,7 @@
 	caddr_t vm_taddr;	/* (c) user virtual address of text */
 	caddr_t vm_daddr;	/* (c) user virtual address of data */
 	caddr_t vm_maxsaddr;	/* user VA at max stack growth */
-	int	vm_refcnt;	/* number of references */
+	volatile int vm_refcnt;	/* number of references */
 	/*
 	 * Keep the PMAP last, so that CPU-specific variations of that
 	 * structure on a single architecture don't result in offset
Index: sys/msgbuf.h
===================================================================
--- sys/msgbuf.h	(revision 212801)
+++ sys/msgbuf.h	(working copy)
@@ -38,7 +38,7 @@
 #define	MSG_MAGIC	0x063062
 	u_int	msg_magic;
 	u_int	msg_size;		/* size of buffer area */
-	u_int	msg_wseq;		/* write sequence number */
+	volatile u_int msg_wseq;	/* write sequence number */
 	u_int	msg_rseq;		/* read sequence number */
 	u_int	msg_cksum;		/* checksum of contents */
 	u_int	msg_seqmod;		/* range for sequence numbers */

-- 
John Baldwin