Robust mutexes implementation

Thu May 5 13:10:39 UTC 2016

I implemented robust mutexes for our libthr.  A robust mutex is
guaranteed to be cleared by the system upon either thread or process
owner termination while the mutex is held.  The next mutex locker is
then notified about inconsistent mutex state and can execute (or
abandon) corrective actions.

The patch mostly consists of small changes here and there, adding
neccessary checks for the inconsistent and abandoned conditions into
existing paths.  Additionally, the thread exit handler was extended
to iterate over the userspace-maintained list of owned robust mutexes,
unlocking and marking as terminated each of them.

The list of owned robust mutexes cannot be maintained atomically
synchronous with the mutex lock state (it is possible in kernel, but
is too expensive).  Instead, for the duration of lock or unlock
operation, the current mutex is remembered in a special slot that is
also checked by the kernel at thread termination.

Kernel must be aware about the per-thread location of the heads of
robust mutex lists and the current active mutex slot.  Initially I tried
to extend TCBs with this data, so only a single syscall at the
threading library initialization would be needed: for any thread the
location of TCB is known by kernel, and the syscall would pass
offsets.  Unfortunately, on some architectures the size of TCB is part
of the fixed ABI and cannot be changed.  Instead, when a thread touches
a robust mutex for the first time, a new umtx op syscall is issued which
informs about location of lists heads.

The umtx sleep queues for PP and PI mutexes are split between
non-robust and robust.  I do not understand the reasoning behind this
POSIX requirement.

Patch passes all glibc tests for robust mutexes I found in the nptl/
directory.  See https://github.com/kostikbel/glibc-robust-tests .

Patch is available at https://kib.kiev.ua/kib/pshared/robust.1.patch
(beware of self-signed root certificate in the chain). Work was
sponsored by The FreeBSD Foundation.

Unrelated things in the patch:

1. Style.  Since I had to re-read whole sys/kern/kern_umtx.c,
   lib/libthr/thread/thr_umtx.h and lib/libthr/thread/thr_umtx.c, I
   started fixing the numerous style violations in these files, which
   actually made my eyes bleed.

2. The fix for proper tdfind() call use in umtxq_sleep_pi() for shared
   pi mutexes.

3. Removal of the struct pthread_mutex m_owner field.  I cannot see
   why it is useful.  The only optimization it provides is the
   possibility to avoid clearing UMUTEX_CONTESTED bit when reading
   m_lock.m_owner.  The disadvantages of having this duplicated field
   is that kernel does not know about pthread_mutex, so cannot fix the
   dup value.  Overall it is less work to clear UMUTEX_CONTESTED when
   checking owner, then to try and handle inconsistencies.

   I added the PMUTEX_OWNER_ID() macro to simplify code.

4. The sysctl kern.ipc.umtx_vnode_persistent is added, which controls
   the lifetime of the shared mutex associated with a vnode' page.
   Apparently, there is real code around which expects the following
   to work:
   - mmap a file, create a shared mutex in the mapping;
   - the process exits;
   - another process starts, mmaps the same file and expects that the
     previously initialized mutex is still usable.

   The knob changes the lifetime of such shared off-page from the
   'destroy on last unmap' to either 'until vnode is reclaimed' or
   until 'pthread_mutex_destroy' called, whatever comes first.