[zfs] attach by name/uuid still attaches wrong device

James R. Van Artsdalen james-freebsd-fs2 at jrv.org
Mon Mar 1 05:58:26 UTC 2010


FreeBSD bigtex.housenet.jrv 9.0-CURRENT FreeBSD 9.0-CURRENT #2 r200727M:
Tue Dec 22 23:25:56 CST 2009    
james at bigtex.housenet.jrv:/usr/obj/usr/src/sys/BIGTEX  amd64

It appears the zfs/vdev_geom.c can still attach to the wrong device in
some cases.  Note in the zpool status output how ada10 appears in two
different vdevs.

What happened is that a disk failed completely (scbus3 target 3) and is
no longer detected by the driver.  At boot time:

1. ZFS fails to attach by path and UUID, since what was at ada11 is now
at ada10 and has a different  UUID.
2. ZFS fails to attach by UUID since that UUID is on a dead drive and
can no longer be found anywhere.
3. ZFS then attaches by path blindly, even though that drive is in a
different part of the pool and has a different UUID.

I don't think it's possible to do this right in vdev_geom.c: there's no
way to guess what is intended without a hint from higher ZFS layers as
to which drives should be found and which are new.

The best fixes I can think of are to expose drives by serial number in
GEOM, or perhaps as a fall-back expose names that are geographic
locations, i.e., "/dev/scbus0/target3/lun0".

# zpool status   
  pool: bigtex
 state: DEGRADED
status: One or more devices could not be used because the label is
missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J
 scrub: none requested
config:

        NAME                                            STATE     READ
WRITE CKSUM
        bigtex                                          DEGRADED    
0     0     0
          mirror                                        ONLINE      
0     0     0
            ada6                                        ONLINE      
0     0     0
            ada13                                       ONLINE      
0     0     0
          mirror                                        ONLINE      
0     0     0
            ada4                                        ONLINE      
0     0     0
            ada11                                       ONLINE      
0     0     0
          mirror                                        ONLINE      
0     0     0
            gptid/dbb5f9fd-5e40-11de-bef4-001aa01b0286  ONLINE      
0     0     0
            ada2p7                                      ONLINE      
0     0     0
          mirror                                        ONLINE      
0     0     0
            ada7                                        ONLINE      
0     0     0
            ada14                                       ONLINE      
0     0     0
          mirror                                        ONLINE      
0     0     0
            ada3                                        ONLINE      
0     0     0
            ada10                                       ONLINE      
0     0     0
          mirror                                        ONLINE      
0     0     0
            ada5                                        ONLINE      
0     0     0
            ada12                                       ONLINE      
0     0     0
          mirror                                        ONLINE      
0     0     0
            ada9                                        ONLINE      
0     0     0
            ada15                                       ONLINE      
0     0     0
          mirror                                        DEGRADED    
0     0     0
            ada10                                       FAULTED     10 
754K     0  corrupted data
            ada16                                       ONLINE      
0     0     0

errors: No known data errors

# camcontrol devlist
<WDC WD15EADS-00R6B0 01.00A01>     at scbus0 target 0 lun 0 (ada2,pass6)
<WDC WD20EADS-00R6B0 01.00A01>     at scbus0 target 1 lun 0 (ada3,pass7)
<WDC WD20EADS-00R6B0 01.00A01>     at scbus0 target 2 lun 0 (ada4,pass8)
<WDC WD20EADS-00R6B0 01.00A01>     at scbus0 target 3 lun 0 (ada5,pass9)
<Port Multiplier 37261095 1706>    at scbus0 target 15 lun 0 (pass0,pmp0)
<WDC WD20EADS-00R6B0 01.00A01>     at scbus3 target 0 lun 0 (ada6,pass10)
<WDC WD20EADS-00R6B0 01.00A01>     at scbus3 target 1 lun 0 (ada7,pass11)
<WDC WD20EADS-00R6B0 01.00A01>     at scbus3 target 2 lun 0 (ada9,pass13)
<Port Multiplier 37261095 1706>    at scbus3 target 15 lun 0 (pass1,pmp1)
<ST31500343AS SD35>                at scbus4 target 0 lun 0 (ada8,pass12)
<ST32000542AS CC32>                at scbus4 target 1 lun 0 (ada10,pass14)
<ST32000542AS CC32>                at scbus4 target 2 lun 0 (ada11,pass15)
<ST32000542AS CC32>                at scbus4 target 3 lun 0 (ada12,pass16)
<Port Multiplier 37261095 1706>    at scbus4 target 15 lun 0 (pass2,pmp2)
<ST32000542AS CC32>                at scbus7 target 0 lun 0 (ada13,pass17)
<ST32000542AS CC32>                at scbus7 target 1 lun 0 (ada14,pass18)
<ST32000542AS CC32>                at scbus7 target 2 lun 0 (ada15,pass19)
<ST32000542AS CC32>                at scbus7 target 3 lun 0 (ada16,pass20)
<Port Multiplier 37261095 1706>    at scbus7 target 15 lun 0 (pass3,pmp3)
<ST31500341AS CC1G>                at scbus8 target 0 lun 0 (pass4,ada0)
<ST31500341AS CC1G>                at scbus11 target 0 lun 0 (pass5,ada1)

# grep ada10 /var/run/dmesg.boot
vdev_geom_read_guid:301[1]: Reading guid from ada10...
vdev_geom_read_guid:339[1]: guid for ada10 is 12768899409278570370
vdev_geom_open_by_path:466[1]: Found provider by name /dev/ada10.
vdev_geom_attach:112[1]: Attaching to ada10.
vdev_geom_attach:138[1]: Found consumer for ada10.
vdev_geom_attach:157[1]: Used existing consumer for ada10.
vdev_geom_read_guid:301[1]: Reading guid from ada10...
vdev_geom_read_guid:339[1]: guid for ada10 is 12768899409278570370
vdev_geom_detach:173[1]: Closing access to ada10.
vdev_geom_open_by_path:477[1]: guid mismatch for provider /dev/ada10:
3665972767133355802 != 12768899409278570370.
vdev_geom_read_guid:301[1]: Reading guid from ada10...
vdev_geom_read_guid:339[1]: guid for ada10 is 12768899409278570370
vdev_geom_open_by_path:466[1]: Found provider by name /dev/ada10.
vdev_geom_attach:112[1]: Attaching to ada10.
vdev_geom_attach:138[1]: Found consumer for ada10.
vdev_geom_attach:157[1]: Used existing consumer for ada10.
vdev_geom_detach:173[1]: Closing access to ada10.
vdev_geom_detach:173[1]: Closing access to ada10.
vdev_geom_detach:177[1]: Destroyed consumer to ada10.
vdev_geom_read_guid:301[1]: Reading guid from ada10...
vdev_geom_read_guid:339[1]: guid for ada10 is 12768899409278570370
vdev_geom_attach:112[1]: Attaching to ada10.
vdev_geom_attach:153[1]: Created consumer for ada10.
vdev_geom_open_by_guid:446[1]: Attach by guid [12768899409278570370]
succeeded, provider /dev/ada10.
vdev_geom_read_guid:301[1]: Reading guid from ada10...
vdev_geom_read_guid:339[1]: guid for ada10 is 12768899409278570370
vdev_geom_open_by_path:466[1]: Found provider by name /dev/ada10.
vdev_geom_attach:112[1]: Attaching to ada10.
vdev_geom_attach:138[1]: Found consumer for ada10.
vdev_geom_attach:157[1]: Used existing consumer for ada10.
vdev_geom_read_guid:301[1]: Reading guid from ada10...
vdev_geom_read_guid:339[1]: guid for ada10 is 12768899409278570370
vdev_geom_detach:173[1]: Closing access to ada10.
vdev_geom_open_by_path:477[1]: guid mismatch for provider /dev/ada10:
3665972767133355802 != 12768899409278570370.
vdev_geom_read_guid:301[1]: Reading guid from ada10...
vdev_geom_read_guid:339[1]: guid for ada10 is 12768899409278570370
vdev_geom_open_by_path:466[1]: Found provider by name /dev/ada10.
vdev_geom_attach:112[1]: Attaching to ada10.
vdev_geom_attach:138[1]: Found consumer for ada10.
vdev_geom_attach:157[1]: Used existing consumer for ada10.
vdev_geom_detach:173[1]: Closing access to ada10.
#



More information about the freebsd-fs mailing list