From nobody Mon Jul 19 08:01:17 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id D12E13EBBF8 for ; Mon, 19 Jul 2021 08:01:24 +0000 (UTC) (envelope-from dchagin@heemeyer.club) Received: from heemeyer.club (heemeyer.club [IPv6:2001:19f0:6400:80a1:5054:ff:fe7a:a27d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4GSvS371l8z4v4y for ; Mon, 19 Jul 2021 08:01:23 +0000 (UTC) (envelope-from dchagin@heemeyer.club) Received: from heemeyer.club (localhost [127.0.0.1]) by heemeyer.club (8.16.1/8.16.1) with ESMTP id 16J81Htn001463; Mon, 19 Jul 2021 11:01:17 +0300 (MSK) (envelope-from dchagin@heemeyer.club) Received: (from dchagin@localhost) by heemeyer.club (8.16.1/8.16.1/Submit) id 16J81HSC001462; Mon, 19 Jul 2021 11:01:17 +0300 (MSK) (envelope-from dchagin) Date: Mon, 19 Jul 2021 11:01:17 +0300 From: Dmitry Chagin To: Konstantin Belousov Cc: freebsd-hackers@freebsd.org Subject: Re: pondering pi futexes Message-ID: References: List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 4GSvS371l8z4v4y X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of dchagin@heemeyer.club has no SPF policy when checking 2001:19f0:6400:80a1:5054:ff:fe7a:a27d) smtp.mailfrom=dchagin@heemeyer.club X-Spamd-Result: default: False [0.25 / 15.00]; RCVD_TLS_LAST(0.00)[]; ARC_NA(0.00)[]; FREEFALL_USER(0.00)[dchagin]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-0.92)[-0.925]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[freebsd.org]; RBL_DBL_DONT_QUERY_IPS(0.00)[2001:19f0:6400:80a1:5054:ff:fe7a:a27d:from]; AUTH_NA(1.00)[]; NEURAL_SPAM_SHORT(0.97)[0.972]; SPAMHAUS_ZRD(0.00)[2001:19f0:6400:80a1:5054:ff:fe7a:a27d:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; R_SPF_NA(0.00)[no SPF record]; FREEMAIL_TO(0.00)[gmail.com]; FORGED_SENDER(0.30)[dchagin@freebsd.org,dchagin@heemeyer.club]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:20473, ipnet:2001:19f0:6400::/38, country:US]; FROM_NEQ_ENVFROM(0.00)[dchagin@freebsd.org,dchagin@heemeyer.club]; MAILMAN_DEST(0.00)[freebsd-hackers]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N On Fri, Jul 02, 2021 at 01:51:21PM +0300, Konstantin Belousov wrote: > On Fri, Jul 02, 2021 at 12:53:01AM +0300, Dmitry Chagin wrote: > > On Mon, Jun 28, 2021 at 02:20:25PM +0300, Konstantin Belousov wrote: > > > On Sun, Jun 27, 2021 at 10:39:35PM +0300, Dmitry Chagin wrote: > > > > > > > > Hi, > > > > some time ago I have changed Linuxulator futexes from sx lock to mtx. > > > > sx was used as it allows copyin/copyout with sx lock held. > > > > to use mtx I have changed the code like: > > > > 1. lock mtx; > > > > 2. disable_pagefaults; > > > > 3. copyin() > > > > 4. enable_pagefaults; > > > > 5. if error > > > > - unlock mtx; > > > > copyin(); > > > > if error == 0 goto 1. > > > > > > > > it works (needto replace copyin() by fueword32()), but pondering pi futexes > > > > imlementation, I see that it is not possible to drop the futex lock on a return > > > > from msleep() path. > > > > > > > > below a simplified FUTEX_LOCK_PI operation, where on enter to the kernel current thread: > > > > > > > > 0. acquire futex lock (which is mtx) > > > > 1. cmpset(0 -> current thread TID), return (0) on success; > > > > 2. fetch() from futex *uaddr (for TID of owner): > > > > - check EDEADLK case (the futex word at *uaddr is already locked by the caller); > > > > - check that is no waiters on *uaddr exists which is waiting via FUTEX_WAIT or > > > > FUTEX_WAIT_BITSET, return (EINVAL) if so; > > > > - cmpset(TID -> (FUTEX_WAITERS|TID)); > > > > - on error, the futex owner changed in user-space, repeat from 1. > > > > 3. Here we have: the owner, one waiter (current thread) and 0 or more waiters > > > > sleeping on a waiting_proc. FUTEX_WAITERS bit is set, so any new waiters go to > > > > the kernel and owner should unlock futex via the FUTEX_UNLOCK_PI op; > > > > 4. Try to find the thread which is associated with the owner’s TID: > > > > - on error, something bad happened, owner died? Clean owner state link? > > > > return (ESRCH). Or if no other waiters? Check this... > > > > - on success: > > > > - save owner state link to the struct futex (save priority); > > > > - check the owner's priority, bump it if needed; > > > > - put the current thread to the waiters list in descending priority order; > > > > - change priority of all waiters if needed; > > > > - msleep on a futex waiting_proc; come back with futex lock held; > > > > - restore own priority? If last waiter?; [ponders..] > > > > - on timeout return (ETIMEDOUT); > > > > - the current thread is the new owner: > > > > bah!! - store() the owner TID to *uaddr; [check what should I do on error..] > > > > - release futex lock; > > > > - return (0). > > > > > > > > is it possible to hold *uaddr page to prevent page faults? > > > > > > I did not followed exact algorithm you trying to describe. Still, I can make > > > two points which could be useful for formulation of the working solution. > > > > > > 1. Umtx AKA FreeBSD native implementation of something very similar to > > > futex, has a concept of the umtx queue chain. The chain owns the mutex > > > lock used for 'fast' ops, but for situations like accesses to userspace, > > > we 'busy' the umtxq chain. De-facto busy state is the hand-rolled > > > sleepable lock, with usual interlocking against chain mutex, and > > > msleep/wakeup inter-thread notifications. > > > > > reading umtx impl I see umtxq_sleep drop lock (PDROP bit is set) and, in case > > of a signal, reacquire lock and breaks from loop. is there a possible > > small window for wakeup loss? as ERESTART returned, or the caller should > > to take care of this? Or I missing something? > > Im trying to reuse the umtx code for futexes, at least queue chain. > > Thanks in advance! > > What is the race you see there? If we get EINTR/ERESTART from msleep, > the loop inside umtxq_sleep() is not re-entered, and we do not sleep waiting > for the chain to become unbusy anymore. Even if there is a wakeup on the > chain, before or after we acquired the lock, it does not matter for us > since we are not going to sleep on it anymore. > > Yes, it is up to the caller to decide what to do with EINTR/ERESTART. Some > umtx ops are specified to not react abruptly to signals, they should finish > the op anyway. For such ops, specific checks for stops and exit requests > are still needed. > Hi, thanks for the reply, I mostly finished, the new futex impl is fully based on the umtx code, one question before review. some umtx API, which is needed for futexes, inlined, like umtxq_busy/unbusy, umtxq_lock/unlock, umtx_pi_alloc/pi_free, etc.. For now I moved such API to the umtx header, but as far as I understand compilers are smart enough now to optimize code without suggestions. Maybe it's time to drop inline hint?