Fast sigblock (AKA rtld speedup)

Tue Jan 15 09:21:37 UTC 2013

On 14 Jan 2013, at 18:58, John Baldwin wrote:

> I'm less certain.  Note that you can't inline mutex ops until you expose
> the mutexes themselves to userland (that is, making pthread_mutex_t not
> be opaque).

This is one of the things that will be required anyway if we wish to support process-shared mutexes (they've been in POSIX since 1997, so it's probably getting on for time we did), as the current mutex-is-a-pointer implementation depends on the virtual address space of the creator, and so does not work if the mutex is created in a shared memory segment.

That said, even with the current implementation we wouldn't need to expose the entire mutex structure, just the word that is used as the fast path.  The inline version would be something like:

struct pthread_mutex_header
{
	_Atomic(long) lock_word;
	// other private fields not exposed in header;
};
typedef struct pthread_mutex_header *pthread_mutex_t;

// Implementation in libthr / libc, which calls into the kernel.
int pthread_mutex_lock_slowly(pthread_mutex_t*);

inline int pthread_mutex_lock(pthread_mutex_t *mutex)
{
	int desired = 0;
	if (atomic_compare_exchange_weak_explicit(&(*mutex)->lock_word, &desired, 1, memory_order_acquire, memory_order_relaxed))
		return 0;
	return pthread_mutex_lock_slowly(mutex);
}

The slow path is only needed when the mutex can't be acquired trivially in userspace. On x86, the fast path adds 6 extra instructions, including a branch that can be statically hinted if we want (assume that we won't be going down the slow path, because a mispredicted branch doesn't add much to the cost of the syscall if we are).  

The corresponding saving is that we get to delete a massive pile of conditionals that we currently have for __is_threaded.  We'd also completely avoid the function call (which is actually two function calls, as we do some trampoline things in libc) in the fast-path case for threaded applications.

A similar saving is possible with read-write locks and possibly with condition variables, although our kernel interface for these is incredibly poorly documented (for once, Linux actually has better documentation: futexes are very well documented).  Looking in umtx.h, it sort-of exposes inline functions that look like this, but given the complete lack of documentation, I have no idea how useable they are.  

David