svn commit: r214915 - user/davidxu/libthr/lib/libthr/thread

Sun Nov 14 18:16:33 UTC 2010

On Sun, Nov 14, 2010 at 02:18:32PM +0800, David Xu wrote:
> Jilles Tjoelker wrote:
> > On Sun, Nov 07, 2010 at 01:49:08PM +0000, David Xu wrote:

> >> Author: davidxu
> >> Date: Sun Nov  7 13:49:08 2010
> >> New Revision: 214915
> >> URL: http://svn.freebsd.org/changeset/base/214915

> >> Log:
> >>   Implement robust mutex, the pthread_mutex locking and
> >>   unlocking code are reworked to support robust mutex and
> >>   other mutex must be locked and unlocked by kernel.

> > The glibc+linux implementation avoids the system call for each robust
> > mutex lock/unlock by maintaining the list in userland and providing a
> > pointer to the kernel. Although this is somewhat less reliable in case a
> > process scribbles over this list, it has better performance.

> > There are various ways this list could be maintained, but the glibc way
> > uses an "in progress" field per thread and a linked list using a field
> > in the pthread_mutex_t, so if we want that we should make sure we have
> > the space in the pthread_mutex_t. Alternatively, a simple array could be
> > used if the number of owned robust mutexes can be limited to a fairly
> > low value.

> > Solaris robust mutexes used to work by entering the kernel for every
> > lock/unlock, but no longer, see
> > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6296770
> > Someone complained about that implementation being too slow.

> I don't like the glibc's idea that reading or writing the mutex after 
> unlocked it,
> why do you think the memory is still valid and being used for the mutex
> after you unlocked it?

> There is a use-case that glibc will mysteriously fail:
> Thread  A at userland unlocked the mutex, then another thread B at
> userland locked it, and thread B reuses the memory area for other purpose,
> before thread A enters kernel, thread B used it for memory-mapped file 
> buffer,
> than write some data into the buffer which will be saved into disk file
> by kernel, but before A runs, the memory happens to contains thread A's
> thread id, then the thread A enters kernel, and thinks the userland
> still hasn't unlocked the mutex, and it tries to write some data into mutex,
> it thinks it unlocked it, but the memory is no longer for mutex now, it just
> simply corrupted the data thread B saved. This implementation is racy and
> dangerous.

That seems rare but very dangerous if it triggers.

> I also know that they have link-entry embedded in mutex,
> if the mutex is being shared by multiple processes,then I can write a
> specific value into the link entry and corrupt the owner's link list,
> even worse, I can write a specific address into the link entry, when
> the owner unlocks it and unlinks it from its list, he will be happy to
> write to any address I specified,  this may be a security problem
> or causes very difficult debugging problem if the mutex memory
> is corrupted.

Indeed. Admittedly, shared memory is inherently not very safe, but one
would expect the unsafety to be restricted to the shared memory segment.

> I think if you want robust mutex, //for whatever you do, there is a price,
> the robust mutex is expensive but reliable. Until you can prove that
> the glibc's write-mutex-memory-after-unlock is  not a problem,
> I would not follow their idea. Based on the research, I think
> Solaris must have a reason to not do it in this way, I would not laugh
> at them that their robust mutex is slow.

As I said, they changed it so it does not require a syscall for
uncontended lock/unlock. Their implementation seems safer than what
glibc+linux does:

* The kernel knows all robust mutexes that are mapped and potentially
  locked in a process (even if there is no thread blocked on them).
  mutexes are added to this list upon pthread_mutex_init() and
  pthread_mutex_lock() and removed when the memory region containing
  them is unmapped. When the process execs or exits, the list is walked
  and all mutexes owned by a thread in this process do the EOWNERDEAD
  thing. A copy of the list is maintained in userland to avoid excessive
  system calls. When a mutex is unmapped, the kernel removes the entry
  from the userland list as well.

* The list of owned robust mutexes is a variable length array instead of
  a linked list, so there are no pointers to thread-private memory in
  the mutex. Furthermore, this list is only used when a thread exits, in
  userland.

This approach appears to solve your objections, except if a program
destroys a robust mutex and starts using the memory for something else
without unmapping it. Perhaps doing some sort of syscall in
pthread_mutex_destroy() to remove the mutex entry for all processes that
have it mapped can solve this.

David Xu wrote:
> I also remembered Solaris 's  condition varaible is automatically
> robust if your mutex is robust mutex type, glibc can not provide
> the feature, when a process crashed, the condition variable's internal
> lock may be held by the dead thread, and the condition variable is
> in not usable state. but Solar's condition variable will still be in
> health state, if a thread found the the pthread_mutex_lock returned
> EOWNERDEAD, it still can use condition variable without any
> problem. Without robust condition variable, a robust mutex is less
> useful.

Indeed, but it seems more-or-less orthogonal to mostly-userland robust
mutex. I think the current kernel condition variable is suitable here,
but I have not thought this through completely.

-- 
Jilles Tjoelker