Re: ZFS - reboot during resilver doesn't work
- Reply: Frank Leonhardt : "Re: [External] Re: ZFS - reboot during resilver doesn't work"
- Reply: Frank Leonhardt : "Re: [External] Re: ZFS - reboot during resilver doesn't work"
- In reply to: Frank Leonhardt : "ZFS - reboot during resilver doesn't work"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 07 Sep 2025 23:23:22 UTC
Frank, You’re not imagining things — I’ve hit this before too. In theory a resilver is just a background job, and you should be able to reboot mid-stream. In practice, two things get in the way: SMR drives really don’t like resilvers. The random write pattern makes their firmware go off into la-la land doing band rewrites, which is usually when the disk light gets stuck on and the box won’t shut down cleanly. On 14.x the shutdown path isn’t great about killing off ZFS resilver threads, so it just sits there waiting on I/O that the drive may never finish. The “backwards progress” after a hard reset is just ZFS being conservative — it widens the DTL and re-walks more of the tree because it can’t be sure what really finished. Annoying, but expected. Easiest workarounds: don’t mix SMR into pools you care about, or if you have to, tune down resilver speed (vfs.zfs.resilver_min_time_ms) to give the disks a fighting chance. With CMR drives you’ll likely see reboots behave as you remember. If you can reproduce it, might be worth a PR so the FreeBSD/OpenZFS folks can look at the shutdown side. — A fellow who’s also been burned by “cheap” SMRs 🙂 On Mon, 8 Sep 2025 00:15:06 +0100 Frank Leonhardt <freebsd-doc@fjl.co.uk> wrote: > If you have a ZFS mirror you're supposed to be able to reboot while > it's resilvering. It's a background job and should just continue > where it left off, more or less. I'm pretty sure I've done this in > the past, with the system on the pool being resilvered too. > > But I've noticed it doesn't quite work (14.3-RELEASE) as it won't > shut down. The disk activity LED is jammed on. It syncs the buffers, > but carries accessing the disks with an unresponsive terminal. The > only unusual thing is that at least one of the drives is SMR (don't > ask!) > > If you do shut it down hard (kill the power) it restarts the > resilvering, but having taken several steps backwards. > > Has anyone else noticed this? > > Why do I care? I don't like to leave a distant data centre without > restarting something a few times to make sure I can reboot safely > later on. > > Thanks, Frank. > > >