Ominous smartd messages ....

Thu Aug 4 02:59:11 UTC 2016

On 08/03/16 19:56, Brandon J. Wandersee wrote:
> William A. Mahaffey III writes:
>
>> On 08/03/16 15:19, Matthew Seaman wrote:
>>> On 03/08/2016 20:13, William A. Mahaffey III wrote:
>>>> What does this mean ?
>>> That there's a bad spot on the disk, which may also mean that you've got
>>> a corrupted filesystem -- depends if the bad spot was in use by zfs or
>>> not.  'zpool scrub' should tell you if the filesystem is corrupted.
>> Can I do that 'zpool scrub' live ?
> Ordinarily, yes. A scrub might lower performance a little while it's
> underway, but it's safe to use the system while you do it. However,
> depending on how much data you have on that pool, a scrub can take a
> long time to finish. A scrub of the measely ~1.8Tb on my pool takes the
> better part of five hours to complete. The risk I would worry about in this
> particular situation is whether leaving the system running long enough
> for a scrub to complete would result in more sectors on the disk failing,
> in an area already passed over by the scrub. If that happened, you'd
> wind up with more corrupted files (assuming there already are corrupted
> files in the first place due to a filesystem problem). Finding and
> fixing those would mean running another scrub, taking up twice the time.
>
> Ordinarily, then, I'd recommend running the scrub after replacing the
> disk. In this particular situation, if you want to try get out of this
> with absolutely no corrupted files, then if at all possible use `zfs
> send | zfs receive` to clone the existing pool to a new pool on another
> machine, and run the scrub there. The problem is that if you intend to
> recreate your current pool in a RAIDZ layout you'll need to back up your
> data, and if you back up your data using rsync (as you have been) and
> then restore it to the new pool using rsync, the checksums for the
> previously good files will be lost and the corrupted files will be given
> new checksums. ZFS won't realize they're corrupted. Bear in mind,
> though, that none of this is to say that any of your files currently are
> corrupted or will be corrupted. This is just a "best approach to
> worst case" as I see it.
>
>> I was/am already thinking along those lines, w/ 1 complication. I have
>> another box (NetBSD 6.1.5) w/ a RAID5 that I wound up building w/
>> mis-aligned disk/RAID blocks in spite of a fair amount of effort to
>> avoid that. I/O writes are horrible, 15-20 MB/s. My understanding is
>> that RAIDZn is like RAID5 in many ways & that you always want 2^n+1
>> (3,5,9, ...) drives in a RAID5 to mitigate those misalignments,
>> presumably in a RAIDZ also. Is that so w/ RAIDZ as well ? If so, I lose
>> more than a small amount of total storage, which is why I went as I did
>> when I built the box whenever that was.
> I don't have enough knowledge/experience with RAIDZ to answer your
> specific questions, but if nothing else you could still combine the disks
> into mirrored vdevs, which are more flexible than RAIDZ, but slightly
> less robust. You'd have a maximum of half the storage space and more
> redundancy than you do now (though significantly less redundancy than
> with a RAIDZ setup).
>
>

When you say mirrored vdevs, are you alluding to a RAID10-ish setup ? My 
zpool man pages says that's a nogo for me (FreeBSD 9.3R), maybe for 
newer .... My various boxen do a fair amount of stuff overnight, 
automated. I will try the 'zpool scrub' tomorrow during the day with the 
machine deliberately lightly loaded.

-- 

	William A. Mahaffey III

  ----------------------------------------------------------------------

	"The M1 Garand is without doubt the finest implement of war
	 ever devised by man."
                            -- Gen. George S. Patton Jr.