Re: git: 5844bd058aed - main - jobc: rework detection of orphaned groups.
- In reply to: Kyle Evans : "Re: git: 5844bd058aed - main - jobc: rework detection of orphaned groups."
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 03 May 2023 06:46:25 UTC
On Sun, Apr 30, 2023 at 11:01:44PM -0500, Kyle Evans wrote:
> On Sat, Jan 9, 2021 at 8:43 PM Konstantin Belousov <kib@freebsd.org> wrote:
> >
> > The branch main has been updated by kib:
> >
> > URL: https://cgit.FreeBSD.org/src/commit/?id=5844bd058aed6f3d0c8cbbddd6aa95993ece0189
> >
> > commit 5844bd058aed6f3d0c8cbbddd6aa95993ece0189
> > Author: Konstantin Belousov <kib@FreeBSD.org>
> > AuthorDate: 2020-12-29 00:41:56 +0000
> > Commit: Konstantin Belousov <kib@FreeBSD.org>
> > CommitDate: 2021-01-10 02:41:20 +0000
> >
> > jobc: rework detection of orphaned groups.
> >
> > Instead of trying to maintain pg_jobc counter on each process group
> > update (and sometimes before), just calculate the counter when needed.
> > Still, for the benefit of the signal delivery code, explicitly mark
> > orphaned groups as such with the new process group flag.
> >
> > This way we prevent bugs in the corner cases where updates to the counter
> > were missed due to complicated configuration of p_pptr/p_opptr/real_parent
> > (debugger).
> >
> > Since we need to iterate over all children of the process on exit, this
> > change mostly affects the process group entry and leave, where we need
> > to iterate all process group members to detect orpaned status.
> >
>
> Hi,
>
> I have a question about the locking here...
>
> > [... snip ...]
> > @@ -678,43 +677,40 @@ jobc_reaper(struct proc *p)
> > }
> >
> > static struct proc *
> > -jobc_parent(struct proc *p)
> > +jobc_parent(struct proc *p, struct proc *p_exiting)
> > {
> > struct proc *pp;
> >
> > - sx_assert(&proctree_lock, SX_LOCKED);
> > + sx_assert(&proctree_lock, SA_LOCKED);
> >
> > pp = proc_realparent(p);
> > - if (pp->p_pptr == NULL ||
> > + if (pp->p_pptr == NULL || pp == p_exiting ||
> > (pp->p_treeflag & P_TREE_GRPEXITED) == 0)
> > return (pp);
> > return (jobc_reaper(pp));
> > }
> >
> > -#ifdef INVARIANTS
> > -static void
> > -check_pgrp_jobc(struct pgrp *pgrp)
> > +static int
> > +pgrp_calc_jobc(struct pgrp *pgrp)
> > {
> > struct proc *q;
> > int cnt;
> >
> > - sx_assert(&proctree_lock, SX_LOCKED);
> > - PGRP_LOCK_ASSERT(pgrp, MA_NOTOWNED);
> > +#ifdef INVARIANTS
> > + if (!mtx_owned(&pgrp->pg_mtx))
> > + sx_assert(&proctree_lock, SA_LOCKED);
> > +#endif
> >
> > cnt = 0;
> > - PGRP_LOCK(pgrp);
> > LIST_FOREACH(q, &pgrp->pg_members, p_pglist) {
> > if ((q->p_treeflag & P_TREE_GRPEXITED) != 0 ||
> > q->p_pptr == NULL)
> > continue;
> > - if (isjobproc(jobc_parent(q), pgrp))
> > + if (isjobproc(jobc_parent(q, NULL), pgrp))
> > cnt++;
> > }
> > - KASSERT(pgrp->pg_jobc == cnt, ("pgrp %d %p pg_jobc %d cnt %d",
> > - pgrp->pg_id, pgrp, pgrp->pg_jobc, cnt));
> > - PGRP_UNLOCK(pgrp);
> > + return (cnt);
> > }
> > -#endif
> >
> > /*
> > * Move p to a process group
> > [... snip ...]
>
> So the proctree lock is sufficient for pgrp_calc_jobc() to provide a
> stable answer needed for everything fixjobc_kill() uses it for...
>
> > @@ -934,35 +827,46 @@ fixjobc_kill(struct proc *p)
> > p->p_treeflag |= P_TREE_GRPEXITED;
> >
> > /*
> > - * Check p's parent to see whether p qualifies its own process
> > - * group; if so, adjust count for p's process group.
> > + * Check if exiting p orphans its own group.
> > */
> > - if (isjobproc(jobc_parent(p), pgrp))
> > - pgadjustjobc(pgrp, false);
> > + pgrp = p->p_pgrp;
> > + if (isjobproc(jobc_parent(p, NULL), pgrp)) {
> > + PGRP_LOCK(pgrp);
> > + if (pgrp_calc_jobc(pgrp) == 0)
> > + orphanpg(pgrp);
> > + PGRP_UNLOCK(pgrp);
> > + }
> >
> > /*
> > * Check this process' children to see whether they qualify
> > - * their process groups after reparenting to reaper. If so,
> > - * adjust counts for children's process groups.
> > + * their process groups after reparenting to reaper.
> > */
> > LIST_FOREACH(q, &p->p_children, p_sibling) {
> > - if ((q->p_treeflag & P_TREE_ORPHANED) != 0)
> > - continue;
> > - fixjobc_kill_q(p, q, true);
> > + pgrp = q->p_pgrp;
> > + PGRP_LOCK(pgrp);
> > + if (pgrp_calc_jobc(pgrp) == 0) {
> > + /*
> > + * We want to handle exactly the children that
> > + * has p as realparent. Then, when calculating
> > + * jobc_parent for children, we should ignore
> > + * P_TREE_GRPEXITED flag already set on p.
> > + */
> > + if (jobc_parent(q, p) == p && isjobproc(p, pgrp))
> > + orphanpg(pgrp);
> > + } else
> > + pgrp->pg_flags &= ~PGRP_ORPHANED;
> > + PGRP_UNLOCK(pgrp);
> > }
> > - LIST_FOREACH(q, &p->p_orphans, p_orphan)
> > - fixjobc_kill_q(p, q, true);
> > - LIST_FOREACH(q, &p->p_children, p_sibling) {
> > - if ((q->p_treeflag & P_TREE_ORPHANED) != 0)
> > - continue;
> > - fixjobc_kill_q(p, q, false);
> > + LIST_FOREACH(q, &p->p_orphans, p_orphan) {
> > + pgrp = q->p_pgrp;
> > + PGRP_LOCK(pgrp);
> > + if (pgrp_calc_jobc(pgrp) == 0) {
> > + if (isjobproc(p, pgrp))
> > + orphanpg(pgrp);
> > + } else
> > + pgrp->pg_flags &= ~PGRP_ORPHANED;
> > + PGRP_UNLOCK(pgrp);
> > }
> > - LIST_FOREACH(q, &p->p_orphans, p_orphan)
> > - fixjobc_kill_q(p, q, false);
> > -
> > -#ifdef INVARIANTS
> > - check_pgrp_jobc(pgrp);
> > -#endif
> > }
> >
> > void
>
> ... and I would imagine the proctree lock is sufficient for
> isjobproc/jobc_parent as well- is there any reason we can't/shouldn't
> just use atomic(9) for operations with pgrp->pg_flags and push all of
> the extra lock acquisitions into the orphanpg() cases? It seems like
> we could avoid taking any pgrp lock in the common case and at least
> mitigate that additional overhead from the exit() path.
My reading is that you are proposing change pgrp locks into
proctree_lock. Flag (PGRP_ORPHANED) needs to be consistent with the
group membership, so it cannot be atomic, it would need to be under the
same lock.
If this works, I do not object, but I did not looked carefully. The most
obvious problem could be that it is not easy to take sx proctree_lock in
some places that only need pgrp mutex right now.