ZFS Commands In "D" State

Thu Jun 8 22:57:45 UTC 2017

procstat -kk 1425 ? (or whatever PID's that is stuck in "D" state)

On Thu, Jun 8, 2017 at 2:13 PM, Tim Gustafson <tjg at ucsc.edu> wrote:
> We have a ZFS server that we've been running for a few months now.
> The server is a backup server that receives ZFS sends from its primary
> daily.  This mechanism has been working for us on several pairs of
> servers for years in general, and for several months with this
> particular piece of hardware.
>
> A few days ago, our nightly ZFS send failed.  When I looked at the
> server, I saw that the "zfs receive" command was in a "D" wait state:
>
> 1425  -  D       0:02.75 /sbin/zfs receive -v -F backup/export
>
> I rebooted the system, checked that "zpool status" and "zfs list" both
> came back correctly (which they did) and then re-started the "zfs
> send" on the master server.  At first, the "zfs receive" command did
> not enter the "D" state, but once the master server started sending
> actual data (which I was able to ascertain because I was doing "zfs
> send" with the -v option), the receiving process entered the "D" state
> again, and another reboot was required.  Only about 2MB of data got
> sent before this happened.
>
> I've rebooted several times, always with the same result.  I did a
> "zpool scrub os" (there's a separate zpool for the OS to live on) and
> that completed in a few minutes, but when I did a "zpool scrub
> backup", that process immediately went into the "D+" state:
>
> 895  0  D+     0:00.04 zpool scrub backup
>
> We run smartd on this device, and that is showing no disk errors.  The
> devd process is logging some stuff, but it doesn't appear to be very
> helpful:
>
> Jun  8 13:52:49 backup ZFS: vdev state changed,
> pool_guid=2176924632732322522 vdev_guid=11754027336427262018
> Jun  8 13:52:49 backup ZFS: vdev state changed,
> pool_guid=2176924632732322522 vdev_guid=11367786800631979308
> Jun  8 13:52:49 backup ZFS: vdev state changed,
> pool_guid=2176924632732322522 vdev_guid=18407069648425063426
> Jun  8 13:52:49 backup ZFS: vdev state changed,
> pool_guid=2176924632732322522 vdev_guid=9496839124651172990
> Jun  8 13:52:49 backup ZFS: vdev state changed,
> pool_guid=2176924632732322522 vdev_guid=332784898986906736
> Jun  8 13:52:50 backup ZFS: vdev state changed,
> pool_guid=2176924632732322522 vdev_guid=16384086680948393578
> Jun  8 13:52:50 backup ZFS: vdev state changed,
> pool_guid=2176924632732322522 vdev_guid=10762348983543761591
> Jun  8 13:52:50 backup ZFS: vdev state changed,
> pool_guid=2176924632732322522 vdev_guid=8585274278710252761
> Jun  8 13:52:50 backup ZFS: vdev state changed,
> pool_guid=2176924632732322522 vdev_guid=17456777842286400332
> Jun  8 13:52:50 backup ZFS: vdev state changed,
> pool_guid=2176924632732322522 vdev_guid=10533897485373019500
>
> No word on which state it changed "from" or "to".  Also, the system
> only has three vdevs (the OS one, and then two raidz2 vdevs that make
> up the "backup" pool, so I'm not sure how it's coming up with more
> than 3 vdev GUIDs).
>
> What's my next step in diagnosing this?
>
> --
>
> Tim Gustafson
> BSOE Computing Director
> tjg at ucsc.edu
> 831-459-5354
> Baskin Engineering, Room 313A
>
> To request BSOE IT support, please visit https://support.soe.ucsc.edu/
> or send e-mail to help at soe.ucsc.edu.
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"