Re: [List] Re: ZFS - reboot during resilver doesn't work

From: Frank Leonhardt <freebsd-doc_at_fjl.co.uk>
Date: Tue, 09 Sep 2025 10:39:43 UTC
On 09/09/2025 01:29, David Christensen wrote:
> On 9/8/25 16:33, David Christensen wrote:
>> On 9/8/25 11:26, Frank Leonhardt wrote:
>>> For everyone's amusement, it re-silvered 900Gb of data in about nine 
>>> hours, which could be worse. I'm convinced, as Doron concurred, it 
>>> was down to the replacement being SMR.
>
>
> STFW I found a relevant article:
>
> https://www.servethehome.com/wd-red-smr-vs-cmr-tested-avoid-red-smr/
>
>
> See the figure "FreeNAS 11.3-U2 RAIDZ Resilver time" on page 2.
>
Fortunately this wasn't a disaster recovery - just swapping out an iffy 
drive on a live mirrored system! Manufacturers say they've improved 
on-drive SMR  a great deal in the five years since that was written. The 
sequential write/read tests  of the whole drive I did before installing 
were impressive for SMR - in fact it beat the existing older CMR drive 
by a good margin. A ZFS re-silver, on the other hand, still runs badly - 
although 100G an hour is acceptable as a one-off hit. On the positive 
side, if it can't take writes any faster than that it's going to reduce 
the stress on the good drives during the rebuild.

The biggest problem in retrospect is the fact it appears to have 
crashed, and can't be rebooted during the rebuild (okay, I gave it 20 
minutes before pulling the plug after shutdown so "can't" is a guess).

>
>> For everyone's amusement, it re-silvered 900Gb of data in about nine 
>> hours, which could be worse. I'm convinced, as Doron concurred, it 
>> was down to the replacement being SMR. These days I test new drives 
>> thoroughly before heading to a data centre with one, and 
>> interestingly it was pretty good on sequential read/write - 250MB/s 
>> sustained. No flaws. Re-silvering it managed about 20Mb/s according 
>> to  iostat. According to zfs status it was doing about 150B/s (that's 
>> bytes).
>
>
> I agree that the `zpool status` statistics can be confusing. AIUI some 
> of the statistics are reported from the start of the replace/ scrub/ 
> etc. operation while others are soft real-time (e.g. the past few 
> seconds).
I'm not sure that's the case. While the stats are clearly bonkers for 
the issuing figure (and real iostat (not zpool iostat)) was reporting 
20MB/s throughout. If they were early stats, not updated, it doesn't 
explain they leap to the correct figure of 20MB/s after a few hours.

Incidentally, a geom mirror re-silvers perfectly well, because that's a 
sequential copy down the disk.

For the curious, these are 2TB Seagate Barracudas (which used to be good 
drives), one first generation and one latest generation. They were only 
900Gb full, although I don't think that would affect the re-silver speed 
(only the time). They're used for backing up datasets, not intended for 
performance. I certainly wouldn't want SMR on a database (or as backing 
store for a Windoze VM).

Regards, Frank.