[Bug 276587] ccp(4) causes 'sysctl -a' to hang when reading OID 'kern.geom.conftxt'

In reply to: bugzilla-noreply_a_freebsd.org: "[Bug 276587] ccp(4) causes 'sysctl -a' to hang when reading OID 'kern.geom.conftxt'"
Go to: [ bottom of page ] [ top of archives ] [ this month ]

From: <bugzilla-noreply_at_freebsd.org>
Date: Fri, 26 Jan 2024 00:39:51 UTC

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276587

--- Comment #3 from Joshua Kinard <freebsd@kumba.dev> ---
(In reply to John Baldwin from comment #2)

So it looks like I was wrong about ccp(4) seemingly working.  It is still
hanging when GEOM is initializing GELI on my swap drives, which I have two in
this system.  What fooled me has two factors:
1. SSH came up and allowed me to login, and I never thought to look at the
console
2. I recently changed the console resolution to 1280x1024, which generally
works, but there is a large black rectangle being drawn over the bottom 1/8th
of the monitor, so I *thought* it was sitting at a login prompt, because I
could see the swap/GELI bits printed out just above the rectangle.  If that
rectangle wasn't there, I'd have noticed it wasn't at a login prompt.

So the reason 'sysctl -a' is hanging is because g_waitfor_event is still
waiting for GELI to finish doing whatever it's stuck doing in trying to init my
encrypted swap volumes.

That means Bug #253860 is still unfixed, so it needs to be re-opened.  This one
is probably a duplicate of that bug, being a side-effect of GELI lost somewhere
in space.

If it helps, I can at least get you 'procstat -kk' outputs of the four GELI
threads on the first encrypted swap volume, where it is stuck:

> root    36041   0.0  0.0      0     16  -  DL   18:50     0:00.00 [g_eli[0] da0p2]
> root    36255   0.0  0.0      0     16  -  DL   18:50     0:00.00 [g_eli[1] da0p2]
> root    36973   0.0  0.0      0     16  -  DL   18:50     0:00.00 [g_eli[2] da0p2]
> root    37530   0.0  0.0      0     16  -  DL   18:50     0:00.00 [g_eli[3] da0p2]

> # procstat -kk 36041
>   PID    TID COMM                TDNAME              KSTACK
> 36041 100531 g_eli[0] da0p2      -                   mi_switch+0xbb _sleep+0x1ed g_eli_worker+0x37e fork_exit+0x7f fork_trampoline+0xe
> 
> # procstat -kk 36255
>   PID    TID COMM                TDNAME              KSTACK
> 36255 100532 g_eli[1] da0p2      -                   mi_switch+0xbb _sleep+0x1ed g_eli_worker+0x37e fork_exit+0x7f fork_trampoline+0xe
> 
> # procstat -kk 36973
>   PID    TID COMM                TDNAME              KSTACK
> 36973 100686 g_eli[2] da0p2      -                   mi_switch+0xbb _sleep+0x1ed g_eli_worker+0x37e fork_exit+0x7f fork_trampoline+0xe
> 
> # procstat -kk 37530
>   PID    TID COMM                TDNAME              KSTACK
> 37530 100687 g_eli[3] da0p2      -                   mi_switch+0xbb _sleep+0x1ed g_eli_worker+0x37e fork_exit+0x7f fork_trampoline+0xe

This feels like there's a missed interrupt not being handled somewhere, so the
process(es) are forever waiting.  Had my fair share of those on Linux when I
played around with driver debugging long ago.

-- 
You are receiving this mail because:
You are the assignee for the bug.