Re: git: 8271d9b99a3b - main - libsys: remove usage of pthread_once and _once_stub

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Thu, 22 Feb 2024 01:11:22 UTC
On Wed, Feb 21, 2024 at 05:23:10PM +0000, Jessica Clarke wrote:
> On 21 Feb 2024, at 14:17, Konstantin Belousov <kostikbel@gmail.com> wrote:
> > 
> > On Wed, Feb 21, 2024 at 12:51:04AM +0000, Jessica Clarke wrote:
> >> On 21 Feb 2024, at 00:29, Konstantin Belousov <kib@FreeBSD.org> wrote:
> >>> 
> >>> The branch main has been updated by kib:
> >>> 
> >>> URL: https://cgit.FreeBSD.org/src/commit/?id=8271d9b99a3b98c662ee9a6257a144284b7e1728
> >>> 
> >>> commit 8271d9b99a3b98c662ee9a6257a144284b7e1728
> >>> Author:     Konstantin Belousov <kib@FreeBSD.org>
> >>> AuthorDate: 2024-02-20 14:45:29 +0000
> >>> Commit:     Konstantin Belousov <kib@FreeBSD.org>
> >>> CommitDate: 2024-02-21 00:26:11 +0000
> >>> 
> >>>   libsys: remove usage of pthread_once and _once_stub
> >>> 
> >>>   that existed in auxv.c, use simple bool gate instead. This leaves a
> >>>   small window if two threads try to call _elf_aux_info(3) simultaneously.
> >>>   The situation is safe because auxv parsing is really idempotent. The
> >>>   parsed data is the same, and we store atomic types (int/long/ptr) so
> >>>   double-init does not matter.
> >> 
> >> You still need to load acquire and store release aux_once though,
> >> otherwise you can see aux_once as true yet read the pre-initialised
> >> data. In practice that’s surely very hard to hit, but the code as
> >> written is now wrong. Also, idempotence should probably be made
> >> unnecessary by using 0/1/2 state for uninitialised/initialising/
> >> initialised, as it’s still technically UB from a C AM perspective due
> >> to not being data race free if two threads initialise at the same time.
> >> Better to just do the correct thing rather than risk things going wrong.
> > 
> > There is too much to handle 'in process' state for loosing thread, I need
> > the whole libthr machinery.
> 
> What do you need libthr for? In pseudo-C:
> 
> x = load_acquire(&aux_once)
> if (__predict_true(x == 2))
>     return;
> if (x == 1 || !compare_exchange_strong_acquire(&aux_once, &x, 1)) {
>     while (x != 2) {
>         yield();
>         x = load_acquire(&aux_once)
>     }
>     return;
> }
> /* initialise as before */
> store_release(&aux_once, 2);
> 
> I believe that’s all you need. Or compare exchange 0 to 1 as the
> initial operation; makes the source code shorter at the expense of a
> more expensive fast path:
> 
> x = 0;
> if (__predict_true(!compare_exchange_strong_acquire(&aux_once, &x, 1)) {
>     while (__predict_false(x != 2)) {
>         yield();
>         x = load_acquire(&aux_once)
>     }
>     return;
> }
> /* initialise as before */
> store_release(&aux_once, 2);
> 
> I probably have bugs in the above, but you get the gist.
The bug in the fragment above is with the yield().  If low-priority thread
enters the '1' (in progress) block, and then is preempted by high-priority
thread also entering init_auxv(), the process would never make a progress.

This is why I said that we need libthr (or umtx), to use real locking and
move the waiting thread off cpu.  In kernel, yield can be used in similar
situations because we can bump the priority, although it is tricky.
> 
> > I added the fences, thanks for noting.
> 
> Thanks.
> 
> Jess
> 
> > WRT being UB from pure C, we already have much more assumptions about
> > atomicity.
> 
>