ZFS ignores some labels, now pool is corrupted.
Paul Mather
paul at gromit.dlib.vt.edu
Fri Jul 31 14:17:12 UTC 2009
I recently repurposed a motley assortment of hardware that used to be
a JBOD ad hoc backup mirror to use FreeBSD 7-STABLE and ZFS. When I
say motley I mean motley: it has four internal SATA 1 TB drives and
three external Maxtor OneTouch 1 TB USB drives. I aggregated together
all of these drives as a single raidz1 using ZFS.
Following a recent suggestion on here, before creating the raidz1 vdev
I labelled each drive as "driveN" using glabel, e.g., "glabel label
drive1 /dev/ad4". (I figured this would be important especially for
the external USB drives, which might get plugged into different USB
ports and thus probed in a different order to the one when the pool
was created and hence shuffle device names.) When creating the pool,
I used "zpool create backups raidz label/drive1 label/drive2 ...".
That all worked for a week or so until today when I rebooted. One of
the USB drives was not probed during boot and so was flagged as
"REMOVED" when doing a "zpool status.":
pool: backups
state: DEGRADED
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
backups DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
label/drive1 ONLINE 0 0 0
label/drive2 ONLINE 0 0 0
label/drive3 ONLINE 0 0 0
label/drive4 ONLINE 0 0 0
label/drive5 REMOVED 0 0 0
label/drive6 ONLINE 0 0 0
label/drive7 ONLINE 0 0 0
errors: No known data errors
I unplugged and plugged in the REMOVED drive's cable to get it to
probe. Eventually, the system appeared to recognise the drive and
resilver:
pool: backups
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Fri Jul 31
07:54:22 2009
config:
NAME STATE READ WRITE CKSUM
backups ONLINE 0 0 0
raidz1 ONLINE 0 0 0
label/drive1 ONLINE 0 0 0 11.5K resilvered
label/drive2 ONLINE 0 0 0 11K resilvered
label/drive3 ONLINE 0 0 0 12K resilvered
label/drive4 ONLINE 0 0 0 11.5K resilvered
label/drive5 ONLINE 0 0 0 17.5K resilvered
label/drive6 ONLINE 0 0 0 13K resilvered
label/drive7 ONLINE 0 0 0 11.5K resilvered
errors: No known data errors
I rebooted again, but, once more, the drive did not probe during boot,
so I had to force it to probe by unplugging and plugging in its USB
cable. This time, however, the drive was mis-identified in the pool
as "da2" instead of "label/drive5" and, in fact, /dev/label/drive5 was
missing:
pool: backups
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Fri Jul 31
07:59:43 2009
config:
NAME STATE READ WRITE CKSUM
backups ONLINE 0 0 0
raidz1 ONLINE 0 0 0
label/drive1 ONLINE 0 0 0 8.50K resilvered
label/drive2 ONLINE 0 0 0 10K resilvered
label/drive3 ONLINE 0 0 0 9K resilvered
label/drive4 ONLINE 0 0 0 10K resilvered
da2 ONLINE 0 0 0 11.5K resilvered
label/drive6 ONLINE 0 0 0 7.50K resilvered
label/drive7 ONLINE 0 0 0 8.50K resilvered
errors: No known data errors
$ ls /dev/label
drive1 drive2 drive3 drive4 drive6 drive7
For some reason, the label was not being detected properly. When I
rebooted, things went from bad to worse. I now have two "da2" devices
show up in my raidz vdev and this time my label/drive7 has
disappeared. This seems to have thrown ZFS for a loop and my vdev is
corrupted:
pool: backups
state: DEGRADED
status: One or more devices could not be used because the label is
missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-4J
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
backups DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
label/drive1 ONLINE 0 0 0
label/drive2 ONLINE 0 0 0
label/drive3 ONLINE 0 0 0
label/drive4 ONLINE 0 0 0
da2 FAULTED 0 0 0 corrupted data
label/drive6 ONLINE 0 0 0
da2 ONLINE 0 0 0
errors: No known data errors
$ ls /dev/label
drive1 drive2 drive3 drive4 drive5 drive6
When I boot up in single-user mode all of my original "driveN" labels
(1-7) show up. However, right now, with ZFS active, label/drive7
refused to appear. Is there a problem with ZFS and labels?
Does anyone have any suggestions for how to repair this pool? I'm
presuming I can't do a "zpool replace backups da2 /dev/label/drive5"
to repair the faulted drive because I now have two "da2" devices in my
vdev.
As a sort of related question, is there a better way to create a pool
out of these devices yet still maximise the amount of storage
(allowing for some redundancy)? For example, would it be better to do
something like this:
zpool create backups raidz label/sata1 label/sata2 label/sata3 label/
sata4 \
raidz label/usb1 label/usb2 label/usb3
(where "sataN" are the internal SATA drives and "usbN" are the
external USB drives) to place the internal and external drives into
separate vdevs (albeit losing an extra drive of storage space to
parity)? (Would that improve I/O speeds? I'm guessing it should.)
Or, is it just storing up trouble to try and mix these USB devices
into the pool as I am now and I'd be best off trying to lobby for an
eSATA enclosure if I want to use external drives?
Cheers,
Paul.
More information about the freebsd-geom
mailing list