SMP question w.r.t. reading kernel variables

Rick Macklem rmacklem at uoguelph.ca
Wed Apr 20 12:42:57 UTC 2011


> On Tue, Apr 19, 2011 at 12:00:29PM +0000,
> freebsd-hackers-request at freebsd.org wrote:
> > Subject: Re: SMP question w.r.t. reading kernel variables
> > To: Rick Macklem <rmacklem at uoguelph.ca>
> > Cc: freebsd-hackers at freebsd.org
> > Message-ID: <201104181712.14457.jhb at freebsd.org>
> 
> [John Baldwin]
> > On Monday, April 18, 2011 4:22:37 pm Rick Macklem wrote:
> > > > On Sunday, April 17, 2011 3:49:48 pm Rick Macklem wrote:
> ...
> > > All of this makes sense. What I was concerned about was memory
> > > cache
> > > consistency and whet (if anything) has to be done to make sure a
> > > thread
> > > doesn't see a stale cached value for the memory location.
> > >
> > > Here's a generic example of what I was thinking of:
> > > (assume x is a global int and y is a local int on the thread's
> > > stack)
> > > - time proceeds down the screen
> > > thread X on CPU 0 thread Y on CPU 1
> > > x = 0;
> > >                                      x = 0; /* 0 for x's location
> > >                                      in CPU 1's memory cache */
> > > x = 1;
> > >                                      y = x;
> > > --> now, is "y" guaranteed to be 1 or can it get the stale cached
> > > 0 value?
> > >     if not, what needs to be done to guarantee it?
> >
> > Well, the bigger problem is getting the CPU and compiler to order
> > the
> > instructions such that they don't execute out of order, etc. Because
> > of that,
> > even if your code has 'x = 0; x = 1;' as adjacent threads in thread
> > X,
> > the 'x = 1' may actually execute a good bit after the 'y = x' on CPU
> > 1.
> 
> Actually, as I recall the rules for C, it's worse than that. For
> this (admittedly simplified scenario), "x=0;" in thread X may never
> execute unless it's declared volatile, as the compiler may optimize it
> out and emit no code for it.
> 
> 
> > Locks force that to sychronize as the CPUs coordinate around the
> > lock cookie
> > (e.g. the 'mtx_lock' member of 'struct mutex').
> >
> > > Also, I see cases of:
> > >      mtx_lock(&np);
> > >      np->n_attrstamp = 0;
> > >      mtx_unlock(&np);
> > > in the regular NFS client. Why is the assignment mutex locked? (I
> > > had assumed
> > > it was related to the above memory caching issue, but now I'm not
> > > so sure.)
> >
> > In general I think writes to data that are protected by locks should
> > always be
> > protected by locks. In some cases you may be able to read data using
> > "weaker"
> > locking (where "no locking" can be a form of weaker locking, but
> > also a
> > read/shared lock is weak, and if a variable is protected by multiple
> > locks,
> > then any singe lock is weak, but sufficient for reading while all of
> > the
> > associated locks must be held for writing) than writing, but writing
> > generally
> > requires "full" locking (write locks, etc.).
> 
Oops, I now see that you've differentiated between writing and reading.
(I mistakenly just stated that you had recommended a lock for reading.
 Sorry about my misinterpretation of the above on the first quick read.)

> What he said. In addition to all that, lock operations generate
> "atomic" barriers which a compiler or optimizer is prevented from
> moving code across.
> 
All good and useful comments, thanks.

The above example was meant to be contrived, to indicate what I was
worried about w.r.t. memory caches.
Here's a somewhat simplified version of what my actual problem is:
(Mostly fyi, in case you are interested.)

Thread X is doing a forced dismount of an NFS volume, it (in dounmount()):
- sets MNTK_UNMOUNTF
- calls VFS_SYNC()/nfs_sync()
  - so this doesn't get hung on an unresponsive server it must test
    for MNTK_UNMOUNTF and return an error it is set. This seems fine,
    since it is the same thread and in a called function. (I can't
    imagine that the optimizer could move setting of a global flag
    to after a function call which might use it.)
- calls VFS_UNMOUNT()/nfs_unmount()
  - now the fun begins...
  after some other stuff, it calls nfscl_umount() to get rid of the
  state info (opens/locks...)
  nfscl_umount() - synchronizes with other threads that will use this
    state (see below) using the combination of a mutex and a
    shared/exclusive sleep lock. (Because of various quirks in the
    code, this shared/exclusive lock is a locally coded version and
    I happenned to call the shared case a refcnt and the exclusive
    case just a lock.)

Other threads that will use state info (open/lock...) will:
-call nfscl_getcl()
  - this function does two things that are relevant
  1 - it allocates a new clientid, as required, while holding the mutex
      - this case needs to check for MNTK_UNMOUNTF and return error, in
        case the clientid has already been deleted by nfscl_umount() above.
      (This happens before #2 because the sleep lock is in the clientd structure.)
--> it must see the MNTK_UNMOUNTF set if it happens after (in a temporal sense)
  being set by dounmount()
  2 - while holding the mutex, it acquires the shared lock
      - if this happens before nfscl_umount() gets the exclusive lock, it is
        fine, since acquisition of the exclusive lock above will wait for its
        release
      however
      - if it acquires the shared lock after nfscl_umount() has had the
        exclusive lock or competes with nfscl_umount() for the lock, it
        could be in trouble.
        - my planned fix for this is to test for MNTK_UNMOUNTF as it tries
          to get the shared lock (and fail with an error return if it is set)
          before if goes to sleep waiting for the exclusive lock to be
          released. (Othere things acquire this exclusive lock, so waiting for
          it usually makes sense, except this case.)
--> again, I think this is fine so long as it sees MNTK_UNMOUNTF set when the
    the test occurs after (in a temporal sense) it was set by the forced dismount
    thread, since that thread set it before calling VFS_UNMOUNT()...nfscl_umount().

Now, in practice, I don't believe there is a problem, but since I have yet to
see anyone state what the rules/assumptions for memory cache consistency are
w.r.t. SMP, I thought I'd pursue it.

I have learned about the "compiler boundaries" (ie. where the optimizer might
move stuff) and other good stuff from this discussion and I thank you guys for that.

rick


More information about the freebsd-hackers mailing list