Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder
Date: Thu, 24 Aug 2023 20:07:22 UTC
On Aug 24, 2023, at 00:22, Mark Millard <marklmi@yahoo.com> wrote:
> On Aug 23, 2023, at 22:54, Mateusz Guzik <mjguzik@gmail.com> wrote:
>
>> On 8/24/23, Mark Millard <marklmi@yahoo.com> wrote:
>>> On Aug 23, 2023, at 15:10, Mateusz Guzik <mjguzik@gmail.com> wrote:
>>>
>>>> On 8/23/23, Mark Millard <marklmi@yahoo.com> wrote:
>>>>> [Forked off the ZFS deadlock 14 discussion, per feedback.]
>>>>> . . .
>>>>
>>>> This is a known problem, but it is unclear if you should be running
>>>> into it in this setup.
>>>
>>> The changed fixed the issue: so I do run into the the issue
>>> for this setup. See below.
>>>
>>>> Can you try again but this time *revert*
>>>> 138a5dafba312ff39ce0eefdbe34de95519e600d, like so:
>>>> git revert 138a5dafba312ff39ce0eefdbe34de95519e600d
>>>>
>>>> may want to switch to a different branch first, for example: git
>>>> checkout -b vfstesting
>>>
>>> # git -C /usr/main-src/ diff sys/kern/vfs_subr.c
>>> diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
>>> index 0f3f00abfd4a..5dff556ac258 100644
>>> --- a/sys/kern/vfs_subr.c
>>> +++ b/sys/kern/vfs_subr.c
>>> @@ -3528,25 +3528,17 @@ vdbatch_process(struct vdbatch *vd)
>>> MPASS(curthread->td_pinned > 0);
>>> MPASS(vd->index == VDBATCH_SIZE);
>>> + mtx_lock(&vnode_list_mtx);
>>> critical_enter();
>>> - if (mtx_trylock(&vnode_list_mtx)) {
>>> - for (i = 0; i < VDBATCH_SIZE; i++) {
>>> - vp = vd->tab[i];
>>> - vd->tab[i] = NULL;
>>> - TAILQ_REMOVE(&vnode_list, vp, v_vnodelist);
>>> - TAILQ_INSERT_TAIL(&vnode_list, vp, v_vnodelist);
>>> - MPASS(vp->v_dbatchcpu != NOCPU);
>>> - vp->v_dbatchcpu = NOCPU;
>>> - }
>>> - mtx_unlock(&vnode_list_mtx);
>>> - } else {
>>> - for (i = 0; i < VDBATCH_SIZE; i++) {
>>> - vp = vd->tab[i];
>>> - vd->tab[i] = NULL;
>>> - MPASS(vp->v_dbatchcpu != NOCPU);
>>> - vp->v_dbatchcpu = NOCPU;
>>> - }
>>> + for (i = 0; i < VDBATCH_SIZE; i++) {
>>> + vp = vd->tab[i];
>>> + TAILQ_REMOVE(&vnode_list, vp, v_vnodelist);
>>> + TAILQ_INSERT_TAIL(&vnode_list, vp, v_vnodelist);
>>> + MPASS(vp->v_dbatchcpu != NOCPU);
>>> + vp->v_dbatchcpu = NOCPU;
>>> }
>>> + mtx_unlock(&vnode_list_mtx);
>>> + bzero(vd->tab, sizeof(vd->tab));
>>> vd->index = 0;
>>> critical_exit();
>>> }
>>>
>>> Still with:
>>>
>>> # grep USE_TMPFS= /usr/local/etc/poudriere.conf
>>> # EXAMPLE: USE_TMPFS="wrkdir data"
>>> #USE_TMPFS=all
>>> #USE_TMPFS="data"
>>> USE_TMPFS=no
>>>
>>>
>>> That allowed the other builders to eventually reach "Builder started"
>>> and later activity, [00:05:50] [27] [00:02:29] Builder started
>>> being the first non-[01] to do so, no vlruwk's observed in what
>>> I saw in top:
>>>
>>> . . .
>>>
>>> Now testing for the zfs deadlock issue should be possible for
>>> this setup.
>>>
>>
>> Thanks for testing, I wrote a fix:
>>
>> https://people.freebsd.org/~mjg/vfs-recycle-fix.diff
>>
>> Applies to *stock* kernel (as in without the revert).
>
> I'm going to leave the deadlock test running for when
> I sleep tonight. So it is going to be a while before
> I get to testing this. $ work will likely happen first
> as well. (No deadlock observed yet, by the way. 6+ hrs
> and 3000+ ports built so far.)
>
> I can easily restore the sys/kern/vfs_subr.c to then
> do normal 14.0-ALPHA2-ish based patching with: so not
> a problem. Thanks.
>
I stopped the deadlock experiment, cleaned out the partial
bulk -a, put back the modern sys/kern/vfs_subr.c , applied
your patch, built, installed, rebooted, and started another
bulk -a run. It made progress on all the builders to and
past "Builder started":
. . .
[00:01:34] Building 34042 packages using up to 32 builders
[00:01:34] Hit CTRL+t at any time to see build progress and stats
[00:01:34] [01] [00:00:00] Builder starting
[00:01:57] [01] [00:00:23] Builder started
[00:01:57] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.20.4
[00:03:09] [01] [00:01:12] Finished ports-mgmt/pkg | pkg-1.20.4: Success
[00:03:22] [01] [00:00:00] Building print/indexinfo | indexinfo-0.3.1
[00:03:22] [02] [00:00:00] Builder starting
[00:03:22] [03] [00:00:00] Builder starting
[00:03:22] [04] [00:00:00] Builder starting
[00:03:22] [05] [00:00:00] Builder starting
[00:03:22] [06] [00:00:00] Builder starting
[00:03:22] [07] [00:00:00] Builder starting
[00:03:22] [08] [00:00:00] Builder starting
[00:03:22] [09] [00:00:00] Builder starting
[00:03:22] [10] [00:00:00] Builder starting
[00:03:22] [11] [00:00:00] Builder starting
[00:03:22] [12] [00:00:00] Builder starting
[00:03:22] [13] [00:00:00] Builder starting
[00:03:22] [14] [00:00:00] Builder starting
[00:03:22] [15] [00:00:00] Builder starting
[00:03:22] [16] [00:00:00] Builder starting
[00:03:22] [17] [00:00:00] Builder starting
[00:03:22] [18] [00:00:00] Builder starting
[00:03:22] [19] [00:00:00] Builder starting
[00:03:22] [20] [00:00:00] Builder starting
[00:03:22] [21] [00:00:00] Builder starting
[00:03:22] [22] [00:00:00] Builder starting
[00:03:22] [23] [00:00:00] Builder starting
[00:03:22] [24] [00:00:00] Builder starting
[00:03:22] [25] [00:00:00] Builder starting
[00:03:22] [26] [00:00:00] Builder starting
[00:03:22] [27] [00:00:00] Builder starting
[00:03:22] [28] [00:00:00] Builder starting
[00:03:22] [29] [00:00:00] Builder starting
[00:03:22] [30] [00:00:00] Builder starting
[00:03:22] [31] [00:00:00] Builder starting
[00:03:22] [32] [00:00:00] Builder starting
[00:03:30] [01] [00:00:08] Finished print/indexinfo | indexinfo-0.3.1: Success
[00:03:30] [01] [00:00:00] Building devel/gettext-runtime | gettext-runtime-0.22
[00:04:42] [01] [00:01:12] Finished devel/gettext-runtime | gettext-runtime-0.22: Success
[00:04:48] [01] [00:00:00] Building devel/libtextstyle | libtextstyle-0.22
[00:05:46] [19] [00:02:24] Builder started
[00:05:46] [15] [00:02:24] Builder started
[00:05:46] [19] [00:00:00] Building graphics/libpotrace | libpotrace-1.16
[00:05:46] [15] [00:00:00] Building devel/libdaemon | libdaemon-0.14_1
[00:05:46] [25] [00:02:24] Builder started
[00:05:46] [25] [00:00:00] Building audio/speexdsp | speexdsp-1.2.1
[00:05:46] [29] [00:02:24] Builder started
[00:05:46] [29] [00:00:00] Building devel/opencl | opencl-3.0.14
. . .
Thanks. I'll let it run as another deadlock test. The
prior run built over 9400 in about 18.5 hr before I
stopped it (no deadlocks observed).
===
Mark Millard
marklmi at yahoo.com