Robust mutexes implementation

Konstantin Belousov kostikbel at gmail.com
Sat May 7 17:00:04 UTC 2016


On Sat, May 07, 2016 at 01:30:11AM +0200, Jilles Tjoelker wrote:
> > 4. The sysctl kern.ipc.umtx_vnode_persistent is added, which controls
> >    the lifetime of the shared mutex associated with a vnode' page.
> >    Apparently, there is real code around which expects the following
> >    to work:
> >    - mmap a file, create a shared mutex in the mapping;
> >    - the process exits;
> >    - another process starts, mmaps the same file and expects that the
> >      previously initialized mutex is still usable.
> 
> >    The knob changes the lifetime of such shared off-page from the
> >    'destroy on last unmap' to either 'until vnode is reclaimed' or
> >    until 'pthread_mutex_destroy' called, whatever comes first.
> 
> The 'until vnode is reclaimed' bit sounds like a recipe for hard to
> reproduce bugs.
> 
> I do think it is related to the robust mutex patch, though.
> 
> Without robust mutexes and assuming threads do not unmap the memory
> while having a mutex locked or while waiting on a condition variable, it
> is sufficient to create the off-page mutex/condvar automatically in its
> initial state when pthread_mutex_lock() or pthread_cond_*wait() are
> called and no off-page object exists.
> 
> With robust mutexes, we need to store somewhere whether the next thread
> should receive [EOWNERDEAD] or not, and this should persist even if no
> processes have the memory mapped. This might be done by replacing
> THR_PSHARED_PTR with a different value in the pthread_mutex_t. I'm not
> sure I like that memory write being done from the kernel though.
In principle, I agree with this.  Note that if we go with something
like THR_OWNERDEAD_PTR, the kernel write to set the value would be not
much different from the kernel write to unlock robust mutex with inlined
lock structures.

Still, I would prefer to to implement this now.  For the local purposes,
the knob was enough, and default value will be 'disabled'.

> 
> The below is a review of https://kib.kiev.ua/kib/pshared/robust.2.patch
> 
> > diff --git a/lib/libc/gen/Symbol.map b/lib/libc/gen/Symbol.map
> 
> It is not necessary to export _pthread_mutex_consistent,
> _pthread_mutexattr_getrobust and _pthread_mutexattr_setrobust (under
> FBSDprivate_1.0 symbol version). They are not used by name outside the
> DSO, only via the jump table.
Removed from the export.

> 
> The same thing is true of many other FBSDprivate_1.0 symbols but there
> is a difference between adding new unnecessary exports and removing
> existing unnecessary exports.

> 
> > +		if ((error2 == 0 || error2 == EOWNERDEAD) && cancel)
> >  			_thr_testcancel(curthread);
> 
> I don't think [EOWNERDEAD] should be swept under the carpet like this.
> The cancellation cleanup handler will use the protected state and unlock
> the mutex without making the state consistent and the state will be
> unrecoverable.
So your argument there is to return EOWNERDEAD and not cancelling,
am I right ?  I reused part of your text as the comment.

> 
> > +void
> > +_mutex_leave_robust(struct pthread *curthread, struct pthread_mutex *m)
> > +{
> > +
> > +	if (!is_robust_mutex(m))
> > +		return;
> 
> This accesses the mutex after writing a value to the lock
> word which allows other threads to lock it. A use after free may result,
> since it is valid for another thread to lock, unlock and destroy the
> mutex (assuming the mutex is not used otherwise later).
> 
> Memory ordering may permit the load of m->m_lock.m_flags to be moved
> before the actual unlock, so this issue may not actually appear.
> 
> Given that the overhead of a system call on every robust mutex unlock is
> not desired, the kernel's unlock of a terminated thread's mutexes will
> unavoidably have this use after free. However, for non-robust mutexes
> the previous guarantees should be kept.
I agree that this is a bug, and agree that the kernel accesses to the
curthread->inact_mtx are potentially unsafe.  I also did not wanted to
issue a syscall for unlock of a robust mutex, as you noted.

I fixed the bug with the is_robust_mutex() test in _mutex_leave_robust() 
by caching the robust status.

I was indeed worried by the kernel access issue you mentioned, but
kernel is immune to 'bad' memory accesses. What bothered me is the ill
ABA situation, where the lock memory is freed and repurposed, and then
the lock word is written with the one of two specific values which give
the same state as for locked mutex. This would cause kernel to 'unlock'
it (but not to follow the invalid m_rb_link).

But for this to happen, we must have a situation where a thread
is being terminated before mutex_unlock_common() reached the
_mutex_leave_robust() call. This is async thread termination, which
then must be either process termination (including execve()), or a call
to thr_exit() from a signal handler in our thread (including async
cancellation).

I am sure that the thr_exit() situation is non-conforming, so the only
concern is the process exit, and then, shared robust mutex, because for
private mutex, only the exiting process state is affected. I can verify
in umtx_handle_rb(), for instance, that for USYNC_PROCESS_SHARED object,
the underlying memory is backed by the umtx shm page. This would close
the race.

But this would interfere with libthr2, if that ever happen.

> 
> > +	int defered, error;
> 
> Typo, should be "deferred".
I also changed PMUTEX_FLAG_DEFERED.

> 
> > +.It Bq Er EOWNERDEAD
> > +The argument
> > +.Fa mutex
> > +points to the robust mutex and the previous owning thread terminated
> > +while holding the mutex lock.
> > +The lock was granted to the caller and it is up to the new owner
> > +to make the state consistent.
> 
> "points to a robust mutex".
> 
> Perhaps add a See .Xr pthread_mutex_consistent 3 here.
Both done.

> 
> > diff --git a/share/man/man3/pthread_mutex_consistent.3 b/share/man/man3/pthread_mutex_consistent.3
> 
> This man page should mention that pthread_mutex_consistent() can be
> called when a mutex lock or condition variable wait failed with
> [EOWNERDEAD].
I took introductory text from the POSIX page for the function.

> 
> > @@ -37,7 +37,11 @@ struct umutex {
> > +	__uintptr_t		m_rb_lnk;	/* Robust linkage */
> 
> Although Linux also stores the robust list nodes in the mutexes like
> this, I think it increases the chance of strange memory corruption.
> Making the robust list an array in the thread's memory would be more
> reliable. If the maximum number of owned robust mutexes can be small,
> this can have a fixed size; otherwise, it needs to grow as needed (which
> does add an allocation that may fail to the pthread_mutex_lock path,
> bad).
I gave this proposal some thought.

I very much dislike an idea of calling memory allocator on the lock, and
esp. the trylock, path.  The later could need to obtain allocator locks
which (sometimes partially) defeat the trylock purpose.

I can use mmap(2) directly there, similarly how pthread_setspecific()
was changed recently, which would avoid the issue of calling userspace
allocator. Still, the problem of an addiitonal syscall, resulting
ENOMEM and also the time to copy the current robust owned list to grown
location are there (I do not see that using chunked allocations is
reasonable, since it would be the same list as current m_rb_lnk, but at
different level.

I prefer to keep the robust linked list for these reasons.

In fact, the deciding argument would be actual application usage of the
robustness. I thought, when writing the patch, when and how would I use
the feature, and did not see compelling arguments to even try to use it.
My stumbling block is the user data consistency recovery: for instance,
I tried to write a loop which would increment shared variable N times,
and I was not able to end up with any simple recovery mechanism from
the aborted iteration, except writing iteration in assembly and have a
parallel tick variable which enumerate each iteration action.

> 
> > + * The umutex.m_lock values and bits.  The m_owner is the word which
> > + * serves as the lock.  It's high bit is the contention indicator,
> > + * rest of bits records the owner TID.  TIDs values start with PID_MAX
> > + * + 2 and ends by INT32_MAX, the low range [1..PID_MAX] is guaranteed
> > + * to be useable as the special markers.
> 
> Typo, "It's" should be "Its" and "ends" should be "end".
> 
> Bruce Evans would probably complain about comma splice (two times).
I tried to reword this.

> 
> > +#define	UMTX_OP_ROBUST		26
> 
> The name is rather vague. Perhaps this can be something like
> UMTX_OP_INIT_ROBUST.

I renamed this to UMTX_OP_ROBUST_LISTS, together with the parameters
structure.

Updated patch is at https://kib.kiev.ua/kib/pshared/robust.3.patch
I did not added the check for umtx shm into the umtx_handle_rb() yet,
waiting for your opinion.


More information about the freebsd-threads mailing list