sem_post() performance

Konstantin Belousov kostikbel at gmail.com
Wed Sep 24 14:45:26 UTC 2014


On Wed, Sep 24, 2014 at 05:31:04PM +0300, Konstantin Belousov wrote:
> On Wed, Sep 24, 2014 at 10:11:51AM -0400, John Baldwin wrote:
> > On Tuesday, September 23, 2014 11:20:00 PM Jilles Tjoelker wrote:
> > > On Mon, Sep 22, 2014 at 03:53:13PM -0400, John Baldwin wrote:
> > > > On Sunday, September 21, 2014 11:37:42 PM Jilles Tjoelker wrote:
> > > > > It has been reported that POSIX semaphores are slow, in contexts such as
> > > > > Python. Note that POSIX semaphores are the only synchronization objects
> > > > > that support use by different processes in shared memory; this does not
> > > > > work for mutexes and condition variables because they are pointers to
> > > > > the actual data structure.
> > > > > 
> > > > > In fact, sem_post() unconditionally performs an umtx system call.
> > > > 
> > > > *sigh*  I was worried that that might be the case.
> > > > 
> > > > > To avoid both lost wakeups and possible writes to a destroyed semaphore,
> > > > > an uncontested sem_post() must check the _has_waiters flag atomically
> > > > > with incrementing _count.
> > > > > 
> > > > > The proper way to do this would be to take one bit from _count and
> > > > > use it for the _has_waiters flag; the definition of SEM_VALUE_MAX
> > > > > permits this. However, this would require a new set of umtx
> > > > > semaphore operations and will break ABI of process-shared semaphores
> > > > > (things may break if an old and a new libc access the same semaphore
> > > > > over shared memory).
> > > > > 
> > > > > This diff only affects 32-bit aligned but 64-bit misaligned
> > > > > semaphores on 64-bit systems, and changes _count and _has_waiters
> > > > > atomically using a 64-bit atomic operation. It probably needs a
> > > > > may_alias attribute for correctness, but <sys/cdefs.h> does not have
> > > > > a wrapper for that.
> > > > 
> > > > It wasn't clear on first reading, but you are using aliasing to get
> > > > around the need for new umtx calls by using a 64-bit atomic op to
> > > > adjust two ints at the same time, yes?  Note that since a failing
> > > > semaphore op calls into the kernel for the "hard" case, you might in
> > > > fact be able to change the ABI without breaking process-shared
> > > > semaphores.  That is, suppose you left 'has_waiters' as always true
> > > > and reused the high bit of count for has_waiters.
> > > > 
> > > > Would old binaries always trap into the kernel?  (Not sure they will,
> > > > especially the case where an old binary creates the semaphore, a new
> > > > binary would have to force has_waiters to true in every sem op, but
> > > > even that might not be enough.)
> > > 
> > > I think that everything will break when a binary linked to old and new
> > > libcs use the same semaphore. If the new contested bit is set, the old
> > > sem_getvalue() will return garbage, the old sem_trywait() will fail even
> > > if the real count is greater than 0, the old sem_wait() and
> > > sem_timedwait() may spin if the real count is greater than 0 and the old
> > > sem_post() will fail with [EOVERFLOW].
> > 
> > Well, keep in mind you can't have a binary linked to both libc's generally 
> > (way too many things break if you do that, like passing the result of strdup() 
> > from one libc to free() of the other, etc.).  The real problem case is 
> > multiple binaries.  However, barring truly bizarre cases with people using 
> > funky LD_LIBRARY_PATH, at least one binary would have to be static to really 
> > get into trouble as in all but the bizarre cases all binaries will be sharing 
> > the same libc.so.7.
> > 
> > However, it was more wishful thinking on my part that we could arrange things
> > to force old binaries to still work (just slowly) while fixing things cleanly.
> > 
> > > > I think this looks ok, but it's a shame we can't fix it more cleanly.
> > > 
> > > I don't like at all that you have to either force an exotic alignment or
> > > use unaligned atomic ops.
> > > 
> > > I saw another interesting sem_ implementation in musl libc. It also uses
> > > a separate waiters word (but an int instead of a boolean), and is based
> > > on the futex calls (which we already have as wait_uint/wake umtx calls).
> > > It looks rather complicated but the musl libc people are usually good at
> > > getting this kind of stuff right. I suppose this will also break subtly
> > > when interoperating with old libc, and adding new umtx ops will simplify
> > > the algorithm.
> > > 
> > > Consideration: just declare mixing process-shared semaphores with
> > > sufficiently different libc unsupported, and change SEM_MAGIC to enforce
> > > that? (This does not prevent running old binaries, as long as they're
> > > dynamically linked to libc and you use a new libc.so.)
> > 
> > This only breaks old static binaries (note that we only recently got pshared
> > semaphores in 9.x or so).  You could even allow partial compat for those if 
> > needed by recognizing the old SEM_MAGIC when opening an existing shared 
> > semaphore (so that compat would work so long as an old binary created the 
> > semaphore).  That compat can even be added retroactively to libc if it turns
> > out we need it.  I probably think that is ok, but it would be an ABI breakage
> > that would need to be documented.  One thing that makes this slightly less
> > painful is that new-style semaphores in 9.0+ are per-chroot, so if you have
> > a jail that runs only old binaries, they will continue to work fine.
> > 
> > I'm cc'ing Konstantin to see what he thinks.
> You may also get in trouble if you upgrade libc without reboot, or use
> static binary built against newer libc on older host.
> 
> But SEM_MAGIC provides at least protection against mixing incompatible
> implementations, and it seems that the checks are enough to ensure
> that both bad cases (new/old and old/new) are handled.
> 
> Do we have a real use case when the old and new implementations have to
> be mixed ?  Was it the same situation when the sem.c -> sem_new.c transition
> was made ?  I think that the answer to the questions  are no/yes, and we
> should just change magic and not worry about the issue until real use
> case appear.
I think it is even worse now. If the application linked against FBSD_1.0
(ksem) semaphores implementation runs on the modern host, it cannot
share semaphore with modern binary.

Since this is not considered significant problem, we can avoid compat
code there as well.

> 
> > 
> > > Consideration 2: use the new implementation only for process-private
> > > semaphores and keep using the old slow one for process-shared
> > > semaphores?
> > 
> > This is certainly safe, just requires extra work in the implementation.
> > 
> > -- 
> > John Baldwin


More information about the freebsd-threads mailing list