ZFS high write IO in single user mode
stb at lassitu.de
Wed Oct 31 13:43:03 UTC 2018
I have two hosts that are configured identically (in a kind of manual hot-standby configuration), running a set of jails each. ZFS datasets for the jails and bhyve VMs are synced across regularly. When one of the machines exhibits a problem, I can shut down the problematic jails or the whole machine, and start the jails/VMs on the other host. This has been working really well for the past ~10 years.
A couple of years ago, one of the ZFS pools on one of the machines developed some logical inconsistencies that were not detected by zpool scrub. The only indication that something was amiss was high disk IO, in particular, writes, even when no processes were running. I eventually resolved that situation by recreating the zpool and restoring the datasets from the working machine.
About a year ago, I upgraded the hardware and in the process created fresh pools. This has been running well. Since about two days ago, I now have the situation again where I have a steady write rate even in single user mode, with the root dataset mounted read only, and the second pool that contains the jail datasets not mounted at all.
I only have a video console (via IPMI KVM) so I won’t transcribe the complete output, but here’s what I think are significant observations:
gstat reports ~30 writes/sec on each of the two disks that make up the zmirror pool.
mount shows the root dataset to be mounted read-only.
zpool status takes a really long time, and then reports that everything is fine for both pools (boot/os and jails).
smartctl doesn’t show any problems for either of the disks.
I’m happy to just wipe the pools and start fresh, but I’d like to use this opportunity to hopefully figure out why ZFS appears to act weirdly, and hopefully find a permanent fix. This is 11-stable from September 13th.
Stefan Bethke <stb at lassitu.de> Fon +49 151 14070811
More information about the freebsd-stable