Re: pondering pi futexes

From: Dmitry Chagin <dchagin_at_freebsd.org>
Date: Mon, 19 Jul 2021 08:01:17 UTC
On Fri, Jul 02, 2021 at 01:51:21PM +0300, Konstantin Belousov wrote:
> On Fri, Jul 02, 2021 at 12:53:01AM +0300, Dmitry Chagin wrote:
> > On Mon, Jun 28, 2021 at 02:20:25PM +0300, Konstantin Belousov wrote:
> > > On Sun, Jun 27, 2021 at 10:39:35PM +0300, Dmitry Chagin wrote:
> > > > 
> > > > Hi,
> > > > some time ago I have changed Linuxulator futexes from sx lock to mtx.
> > > > sx was used as it allows copyin/copyout with sx lock held.
> > > > to use mtx I have changed the code like:
> > > > 1. lock mtx;
> > > > 2. disable_pagefaults;
> > > > 3. copyin()
> > > > 4. enable_pagefaults;
> > > > 5. if error
> > > >    - unlock mtx;
> > > >    copyin();
> > > >    if error == 0 goto 1.
> > > > 
> > > > it works (needto replace copyin() by fueword32()), but pondering pi futexes
> > > > imlementation, I see that it is not possible to drop the futex lock on a return
> > > > from msleep() path. 
> > > > 
> > > > below a simplified FUTEX_LOCK_PI operation, where on enter to the kernel current thread:
> > > > 
> > > > 0. acquire futex lock (which is mtx)
> > > > 1. cmpset(0 -> current thread TID), return (0) on success;
> > > > 2. fetch() from futex *uaddr (for TID of owner):
> > > >    - check EDEADLK case (the futex word at *uaddr is already locked by the caller);
> > > >    - check that is no waiters on *uaddr exists which is waiting via FUTEX_WAIT or
> > > >      FUTEX_WAIT_BITSET, return (EINVAL) if so;
> > > >    - cmpset(TID -> (FUTEX_WAITERS|TID));
> > > >       - on error, the futex owner changed in user-space, repeat from 1.
> > > > 3. Here we have: the owner, one waiter (current thread) and 0 or more waiters
> > > >    sleeping on a waiting_proc. FUTEX_WAITERS bit is set, so any new waiters go to
> > > >    the kernel and owner should unlock futex via the FUTEX_UNLOCK_PI op;
> > > > 4. Try to find the thread which is associated with the owner’s TID:
> > > >    - on error, something bad happened, owner died? Clean owner state link?
> > > >      return (ESRCH). Or if no other waiters? Check this...
> > > >    - on success:
> > > >       - save owner state link to the struct futex (save priority);
> > > >       - check the owner's priority, bump it if needed;
> > > >       - put the current thread to the waiters list in descending priority order;
> > > >       - change priority of all waiters if needed;
> > > >       - msleep on a futex waiting_proc; come back with futex lock held;
> > > >          - restore own priority? If last waiter?; [ponders..]
> > > >          - on timeout return (ETIMEDOUT);
> > > >          - the current thread is the new owner:
> > > > bah!!    - store() the owner TID to *uaddr; [check what should I do on error..]
> > > >          - release futex lock;
> > > >          - return (0).
> > > > 
> > > > is it possible to hold *uaddr page to prevent page faults?
> > > 
> > > I did not followed exact algorithm you trying to describe.  Still, I can make
> > > two points which could be useful for formulation of the working solution.
> > > 
> > > 1. Umtx AKA FreeBSD native implementation of something very similar to
> > > futex, has a concept of the umtx queue chain. The chain owns the mutex
> > > lock used for 'fast' ops, but for situations like accesses to userspace,
> > > we 'busy' the umtxq chain. De-facto busy state is the hand-rolled
> > > sleepable lock, with usual interlocking against chain mutex, and
> > > msleep/wakeup inter-thread notifications.
> > > 
> > reading umtx impl I see umtxq_sleep drop lock (PDROP bit is set) and, in case
> > of a signal, reacquire lock and breaks from loop. is there a possible
> > small window for wakeup loss? as ERESTART returned, or the caller should
> > to take care of this? Or I missing something?
> > Im trying to reuse the umtx code for futexes, at least queue chain.
> > Thanks in advance!
> 
> What is the race you see there?  If we get EINTR/ERESTART from msleep,
> the loop inside umtxq_sleep() is not re-entered, and we do not sleep waiting
> for the chain to become unbusy anymore.  Even if there is a wakeup on the
> chain, before or after we acquired the lock, it does not matter for us
> since we are not going to sleep on it anymore.
> 
> Yes, it is up to the caller to decide what to do with EINTR/ERESTART.  Some
> umtx ops are specified to not react abruptly to signals, they should finish
> the op anyway.  For such ops, specific checks for stops and exit requests
> are still needed.
>
Hi, thanks for the reply, I mostly finished,
the new futex impl is fully based on the umtx code, one question before review.
some umtx API, which is needed for futexes, inlined, like
umtxq_busy/unbusy, umtxq_lock/unlock, umtx_pi_alloc/pi_free,  etc.. 
For now I moved such API to the umtx header, but as far as I understand
compilers are smart enough now to optimize code without suggestions.
Maybe it's time to drop inline hint?