[zfs] filesystem reads hanging

Reshad Patuck reshadpatuck1 at gmail.com
Tue Oct 1 04:56:45 UTC 2019


Hi,

I have a FreeBSD 12.0-RELEASE-p9 system running ZFS.
The system runs an application that uses postgres, and python (among other
services).

I have noticed that python suddenly is not able to connect to postgres.
When I try to investigate further, certain files on disk can not be read.
The commands `cat` and `ls -l` hang (no output and I can not ctrl-c or kill
-9 them), ps -aux shows them in a D+ state.
On killing the SSH session these processes continue running in orphans, I
am not able to kill them.

Someone on IRC suggested running a zfs scrub to check for data corruption,
but running `zpool scrub zroot` has the same effect.
The command does not return, ctrl-c does not kill it and `zpool scrub -s
zroot` says "cannot cancel scrubbing zroot: there is no active scrub".

This has happened in the past 1 month to two of my production servers and
since the application was critical they were rebooted and the boxes
function as normal after the reboot.
Files that were not cat-able on the production servers were working fine
and a zfs scrub worked fine to show 0 errors and 0 fixes.
One of these boxes needed a hard reboot as it got stuck in the shutting
down stage of a soft reboot.

I am not sure where to start debugging this or if there are any ways to get
metrics on a box stuck in this state.
Please let me know if you would like me to fetch any metrics or run and
commands, etc. for you.
Any help would be much appreciated.

Best regards,

Reshad


More information about the freebsd-fs mailing list