ZFS 'read-only' device / pool scan / import?

Tue Oct 19 15:16:04 UTC 2010

On Tue, Oct 19, 2010 at 03:52:52PM +0100, Karl Pielorz wrote:
> On FreeBSD if I bring the system up 'single user' - the first time I
> do (for example):
> 
> "
> zpool status
> "
> 
> There's a pause, a flurry of disk activity - and the system appears
> to import any pools it finds, and the status appears.
> 
> I'd guess at this point - some data is written to the disks? - Is
> there any way of avoiding that, i.e. a kind of "If you were to
> import the pools/display a status, what would you show, without
> actually writing any data?" (or importing it) - i.e. to ensure the
> devices are opened read only?

The activity you see is almost certainly the result of kernel modules
being loaded dynamically and so on, plus disk tasting and metadata
analysis.  This might be different if you had opensolaris_load="yes"
and zfs_load="yes" in /boot/loader.conf, but I'm guessing that isn't the
case.

BTW, in single-user, I always do the following before doing *any*
ZFS-related work.  This is assuming the system uses UFS for /, /var,
/tmp, and /usr.

mount -t ufs -a
/etc/rc.d/hostid start
/etc/rc.d/zfs start

> The reason I ask is we recently had a pool that had a mishap. The
> backing RAID controller dropped a drive (we were using them in
> JBOD). This happened cleanly. The system got shutdown, and I think
> the wrong drive was replaced.
> 
> When it came up the RAID controller 'collapsed' the device list (so
> there was no gap where the old drive was) - that, plus the wrong
> drive being replaced meant when we looked at the system we had:
> 
> "
>  pool: vol
> state: UNAVAIL
> status: One or more devices could not be used because the label is missing
>        or invalid.  There are insufficient replicas for the pool to
>        continue functioning.
> "
> 
> A number of devices were listed as 'corrupted data' - some devices
> were listed twice as members of the pool - i.e. pretty screwed up.
>
> 'undoing' the damage and restarting the server - just threw up the
> same status.
>
> I'm wondering if through the action of having the pool
> imported/mounted etc. - ZFS has actually *written* to the drives
> that were available that other drives aren't available / corrupt -
> and basically, because that info was written, and check-summed
> correctly - it takes that as gospel now, rather than actually
> 're-checking' the drives (or is simply unable to re-check the drives
> - because the meta data has been changed from the previous boot).
> 
> If you see what I mean :)
>
> In the end, we're fortunate - we have backups (and they're currently
> restoring now) - but I was just interested in if you 'attempt' to
> mount/import a messed up pool - it could potentially screwup any
> chances of mounting that pool cleanly again, even if you were to
> 'undo' the hardware changes.
>
> I have a feeling that a zpool import or 'initial' zpool status has
> to be a read/write operation (i.e. would fail anyway if you could
> magically make the underlying devices read-only?)

Experts here might be able to help, but you're really going to need to
provide every little detail, in chronological order.  What commands were
done, what output was seen, what physical actions took place, etc..

1) Restoring from backups is probably your best bet (IMHO; this is what I
would do as well).

2) Take inventory of your hardware; get disk serial numbers (smartctl
can show you this) and so on, and correlate them with specific device
IDs (adaX, daX, etc.).  I also label my drive bays with which device ID
they associate with.

3) You didn't disclose what kind of ZFS pool setup you have.  I'm
guessing raidz1.  Or is it a pool of mirrors?  Or raidz2?  Or...?  It
obviously matters.

4) If it's raidz1, I'm guessing you're hurt, and here's why: the disk
falling off the bus during shut-down (or whatever happened -- meaning
the incident that occurred *prior* to the wrong disk being replaced)
would almost certainly have resulted in ZFS saying "hey! the array is
degraded! Replace the disk! If anything else happens until that disk
is replaced, you'll experience data loss!"

Then somehow the wrong disk was replaced.  At this point you have 1
disk which may be broken/wrong/whatever (the one which disappeared
from the bus during the system shut-down), and one disk which is
brand new and needs to be resilvered... so you're basically down to 1
disk.  raidz1 isn't going to help you in this case: you can lose 1
disk, regardless of situation, period.

There's a lot of other things I could add to the item list here
(probably reach 9 or 10 if I tried), but in general the above sounds
like its what happened.  raidz2 would have been able to save you in this
situation, but would require at least 4 disks.

-- 
| Jeremy Chadwick                                   jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |