Re: ZFS - reboot during resilver doesn't work

From: Doron Beit-Halahmi <doc_at_filenotfound.org>
Date: Sun, 07 Sep 2025 23:23:22 UTC
Frank,

You’re not imagining things — I’ve hit this before too. In theory a
resilver is just a background job, and you should be able to reboot
mid-stream. In practice, two things get in the way:

SMR drives really don’t like resilvers. The random write pattern makes
their firmware go off into la-la land doing band rewrites, which is
usually when the disk light gets stuck on and the box won’t shut down
cleanly.

On 14.x the shutdown path isn’t great about killing off ZFS resilver
threads, so it just sits there waiting on I/O that the drive may never
finish.

The “backwards progress” after a hard reset is just ZFS being
conservative — it widens the DTL and re-walks more of the tree because
it can’t be sure what really finished. Annoying, but expected.

Easiest workarounds: don’t mix SMR into pools you care about, or if you
have to, tune down resilver speed (vfs.zfs.resilver_min_time_ms) to
give the disks a fighting chance. With CMR drives you’ll likely see
reboots behave as you remember.

If you can reproduce it, might be worth a PR so the FreeBSD/OpenZFS
folks can look at the shutdown side.

— A fellow who’s also been burned by “cheap” SMRs 🙂

On Mon, 8 Sep 2025 00:15:06 +0100
Frank Leonhardt <freebsd-doc@fjl.co.uk> wrote:

> If you have a ZFS mirror you're supposed to be able to reboot while
> it's resilvering. It's a background job and should just continue
> where it left off, more or less. I'm pretty sure I've done this in
> the past, with the system on the pool being resilvered too.
> 
> But I've noticed it doesn't quite work (14.3-RELEASE) as it won't
> shut down. The disk activity LED is jammed on. It syncs the buffers,
> but carries accessing the disks with an unresponsive terminal. The
> only unusual thing is that at least one of the drives is SMR (don't
> ask!)
> 
> If you do shut it down hard (kill the power) it restarts the 
> resilvering, but having taken several steps backwards.
> 
> Has anyone else noticed this?
> 
> Why do I care? I don't like to leave a distant data centre without 
> restarting something a few times to make sure I can reboot safely
> later on.
> 
> Thanks, Frank.
> 
> 
>