Re: ZFS operations hanging, but no visible errors?

From: Chris Ross <cross+freebsd_at_distal.com>
Date: Fri, 05 Nov 2021 19:05:25 UTC

> On Nov 5, 2021, at 13:12, Chris Ross <cross+freebsd@distal.com> wrote:
> 
> Okay.  Despite everything I had running being stuck I was able to log into the console, and coincidentally or not, things have now recovered.  Well, the old commands/sessions didn’t, but I can log in again.  I can’t get to the tmux session it seems, but.
> 
> I’m able to run that sysctl, which has a lot of data.  The last records all about two hours ago are:
> 
> 1636125429   metaslab.c:2538:metaslab_unload(): metaslab_unload: txg 1033689, spa tank, vdev_id 1, ms_id 854, weight 780000000000001, selected txg 1033574 (601067 ms ago), alloc_txg 1033313, loaded 5902891 ms ago, max_size 2147475456
> 1636125429   metaslab.c:2538:metaslab_unload(): metaslab_unload: txg 1033689, spa tank, vdev_id 2, ms_id 88, weight 880000000000001, selected txg 1033574 (601067 ms ago), alloc_txg 1020497, loaded 864138 ms ago, max_size 17179869184
> 1636125429   metaslab.c:2538:metaslab_unload(): metaslab_unload: txg 1033689, spa tank, vdev_id 1, ms_id 859, weight 780000000000001, selected txg 1033574 (601067 ms ago), alloc_txg 1033029, loaded 2201252 ms ago, max_size 2147475456
> 1636125429   metaslab.c:2538:metaslab_unload(): metaslab_unload: txg 1033689, spa tank, vdev_id 1, ms_id 860, weight 780000000000001, selected txg 1033574 (601067 ms ago), alloc_txg 1033229, loaded 3395548 ms ago, max_size 2147303424
> 1636125429   metaslab.c:2538:metaslab_unload(): metaslab_unload: txg 1033689, spa tank, vdev_id 1, ms_id 863, weight 7c0000000000001, selected txg 1033574 (601067 ms ago), alloc_txg 1033448, loaded 4046753 ms ago, max_size 4294926336
> 
> Not sure if that helps….

Okay.  Following up just to close out the “active” state of the issue.  It
became unresponsive again moments after the above.  The kernel
was functional, as I was able to switch to multiple virtual consoles,
but logging in only yielded a “Last login” line, then nothing else.
C/R’s were echoed on consoles, but nothing else happened.

I issued a Ctrl-Alt-Delete, and it began stopping things, failed the 90
second watchdog timer and noted terminating shutdown abnormally.
The kernel did eventually report “All buffers synced.” then nothing else.

After about 10 minutes, I tried Ctrl-Alt-Delete again, and then power-cycled
the box.

I’d still be interested in hearing any theories about what happened, but
I no longer have the device in this state to test.

                - Chris