gmultipath, ses and shared disks / cant seem to share between local nodes

Teske, Devin Devin.Teske at fisglobal.com
Thu Apr 18 00:05:20 UTC 2013


On Apr 17, 2013, at 4:56 PM, Outback Dingo wrote:




On Wed, Apr 17, 2013 at 7:29 PM, Teske, Devin <Devin.Teske at fisglobal.com<mailto:Devin.Teske at fisglobal.com>> wrote:

On Apr 17, 2013, at 4:10 PM, Outback Dingo wrote:




On Wed, Apr 17, 2013 at 6:39 PM, Teske, Devin <Devin.Teske at fisglobal.com<mailto:Devin.Teske at fisglobal.com>> wrote:

On Apr 17, 2013, at 3:26 PM, Outback Dingo wrote:

> Ok, maybe im at a loss here in the way my brain is viewing this
>
> we have a box, its got 2 nodes in the chassis, and 32 sata drives
> attached to a SATA/SAS backplane via 4 (2 per node) LSI MPT SAS2 cards
> should i not logically be seeing 4 controllers X #drive count ??
>
> camcontrol devlist shows 32 devices, daX,passX and sesX,passX
>
> <SEAGATE ST33000650SS 0004>        at scbus0 target 9 lun 0 (da0,pass0)
> <STORBRICK-3 1400>        at scbus0 target 10 lun 0 (ses0,pass1)
> <SEAGATE ST33000650SS 0004>        at scbus0 target 11 lun 0 (da1,pass2)
> <STORBRICK-1 1400>        at scbus0 target 12 lun 0 (ses1,pass3)
> <SEAGATE ST33000650SS 0004>        at scbus0 target 13 lun 0 (da2,pass4)
> <STORBRICK-2 1400>        at scbus0 target 14 lun 0 (ses2,pass5)
> <SEAGATE ST33000650SS 0004>        at scbus0 target 15 lun 0 (da3,pass6)
> <STORBRICK-4 1400>        at scbus0 target 16 lun 0 (ses3,pass7)
> <SEAGATE ST33000650SS 0004>        at scbus0 target 17 lun 0 (da4,pass8)
> <STORBRICK-6 1400>        at scbus0 target 18 lun 0 (ses4,pass9)
> <SEAGATE ST33000650SS 0004>        at scbus0 target 19 lun 0 (da5,pass10)
> <STORBRICK-0 1400>        at scbus0 target 20 lun 0 (ses5,pass11)
> <SEAGATE ST33000650SS 0004>        at scbus0 target 21 lun 0 (da6,pass12)
> <STORBRICK-7 1400>        at scbus0 target 22 lun 0 (ses6,pass13)
> <SEAGATE ST33000650SS 0004>        at scbus0 target 23 lun 0 (da7,pass14)
> <STORBRICK-5 1400>        at scbus0 target 24 lun 0 (ses7,pass15)
> <SEAGATE ST9300605SS 0004>         at scbus1 target 0 lun 0 (da8,pass16)
> <SEAGATE ST9300605SS 0004>         at scbus1 target 1 lun 0 (da9,pass17)
> <STORBRICK-3 1400>        at scbus8 target 10 lun 0 (ses8,pass19)
> <SEAGATE ST33000650SS 0004>        at scbus8 target 11 lun 0 (da11,pass20)
> <STORBRICK-1 1400>        at scbus8 target 12 lun 0 (ses9,pass21)
> <SEAGATE ST33000650SS 0004>        at scbus8 target 13 lun 0 (da12,pass22)
> <STORBRICK-2 1400>        at scbus8 target 14 lun 0 (ses10,pass23)
> <SEAGATE ST33000650SS 0004>        at scbus8 target 15 lun 0 (da13,pass24)
> <STORBRICK-4 1400>        at scbus8 target 16 lun 0 (ses11,pass25)
> <SEAGATE ST33000650SS 0004>        at scbus8 target 17 lun 0 (da14,pass26)
> <STORBRICK-6 1400>        at scbus8 target 18 lun 0 (ses12,pass27)
> <SEAGATE ST33000650SS 0004>        at scbus8 target 19 lun 0 (da15,pass28)
> <STORBRICK-0 1400>        at scbus8 target 20 lun 0 (ses13,pass29)
> <SEAGATE ST33000650SS 0004>        at scbus8 target 21 lun 0 (da16,pass30)
> <STORBRICK-7 1400>        at scbus8 target 22 lun 0 (ses14,pass31)
> <SEAGATE ST33000650SS 0004>        at scbus8 target 23 lun 0 (da17,pass32)
> <STORBRICK-5 1400>        at scbus8 target 24 lun 0 (ses15,pass33)
> <USB 2.0 Flash Drive 8.07>         at scbus9 target 0 lun 0 (da18,pass34)
>
>
> we would like to create a zpool from all the devices, that in theory if
> nodeA failed
> then nodeB could force import the pool,

gmultipath (which you mention in the subject) is the appropriate tool for this, but there's no need for an import of the pool if you build the pool out of multipath devices. In our experience, we can pull a cable and zfs continues working just fine.

In other words, don't build the pool out of the devices, put a gmultipath label on each device and then use /dev/multipath/LABEL for the zpool devices.


> nodeA and NodeB are attached through
> dual LSI controllers, to the SATA/SAS backplane. but i cant seem to create
> a zpool from sesX or passX devices, i can however create a 16 drive zp0ol
> on either node, from any daX device. what did i miss? ive looked at
> gmirror, and also ses documents. Any insight is appreciated, thanks in
> advance.

gmirror is the wrong tool, gmultipath is what you want. The basic task is to use "gmultipath label FOO da#" to write a cookie on the disk (used to identify new/existing paths during GOEM "taste" events for example).

After you've labeled the da# devices with gmultipath you say "gmultipath status" to see the components of each label and you use "multipath/LABEL" as your disk name when creating the zpool (these correspond directly to /dev/multipath/LABEL, but "zpool create …" or "zpool add …" allow you to omit the leading "/dev").

sanity check me on node A i did

zpool destroy master

gmultipath label FOO da0

gmultipath status
                    Name    Status  Components
           multipath/FOO  DEGRADED  da0 (ACTIVE)
 multipath/FOO-619648737  DEGRADED  da1 (ACTIVE)
 multipath/FOO-191725652  DEGRADED  da2 (ACTIVE)
multipath/FOO-1539342315  DEGRADED  da3 (ACTIVE)
multipath/FOO-1276041606  DEGRADED  da4 (ACTIVE)
multipath/FOO-2000832198  DEGRADED  da5 (ACTIVE)
multipath/FOO-1285640577  DEGRADED  da6 (ACTIVE)
multipath/FOO-1816092574  DEGRADED  da7 (ACTIVE)
multipath/FOO-1102254444  DEGRADED  da8 (ACTIVE)
 multipath/FOO-330300690  DEGRADED  da9 (ACTIVE)
  multipath/FOO-92140635  DEGRADED  da10 (ACTIVE)
 multipath/FOO-855257672  DEGRADED  da11 (ACTIVE)
multipath/FOO-1003634134  DEGRADED  da12 (ACTIVE)
   multipath/FOO-2449862  DEGRADED  da13 (ACTIVE)
multipath/FOO-1137080233  DEGRADED  da14 (ACTIVE)
multipath/FOO-1696804371  DEGRADED  da15 (ACTIVE)
multipath/FOO-1304457562  DEGRADED  da16 (ACTIVE)
 multipath/FOO-912159854  DEGRADED  da17 (ACTIVE)

now on node B i should do the same? reboot both nodes and i should be able "see" 32 multipath/FOO deices to create a pool from ?


It appears from the above output that you labeled all of the block devices (da0 through da17) with the same label.

This is not what you want.

Use "gmultipath clear FOO" on each of the block devices and have another go using unique values.

For example:

gmultipath label SATA_LUN01 da0
gmultipath label SATA_LUN02 da1
gmultipath label SATA_LUN03 da2
gmultipath label SATA_LUN04 da3
gmultipath label SATA_LUN05 da4
gmultipath label SATA_LUN06 da5
gmultipath label SATA_LUN07 da6
gmultipath label SATA_LUN08 da7
gmultipath label SATA_LUN09 da8
gmultipath label SATA_LUN10 da9
gmultipath label SATA_LUN11 da10
gmultipath label SATA_LUN12 da11
gmultipath label SATA_LUN13 da12
gmultipath label SATA_LUN14 da13
gmultipath label SATA_LUN15 da14
gmultipath label SATA_LUN16 da15
gmultipath label SATA_LUN17 da16
gmultipath label SATA_LUN18 da17
..

Then "gmultipath status" should show your unique labels each with a single component.

Then you would do:

zpool create master multipath/SATA_LUN{01,02,03,04,05,06,…}


ahh ok got it, and probably on the other node

gmultipath label SATA_LUN19 da0
gmultipath label SATA_LUN20 da1

-------------------snip------------------------------

gmultipath label SATA_LUN36 da15


No. You do not need to label the other "node"

Since the "gmultipath label …" command writes data to the disk, you do not need to label the disk multiple times (and in fact would be an error to). Rather, as the system is probing and adding disks, it will automatically detect multiple paths based on this data stored on the disk.

Read: If da0 and another da# device are indeed two paths to the same device, then as those devices are probed by the kernel, "gmultipath status" will dynamically show the newly discovered paths.

If, after labeling all the devices on a single path you find that "gmultipath status" still shows only one component for each label, try rebooting. If still after a reboot "gmultipath status" only shows a single component for each label, then clearly you are not configured (hardware wise) for multiple paths to the same components (and this may be where the "gmultipath" versus "gmirror" nit that I caught in your original post comes into play -- maybe "gmultipath" was the wrong thing to put in the subject if you don't have multiple paths to the same components but instead have a mirrored set of components that you want to gmirror all your data to a second pool -- if that ends up being the case, then I would actually recommend a zfs send/receive cron-job based on snapshots to utilize the performance of ZFS Copy On Write rather than perhaps gmirror; but your mileage may vary).
--
Devin



then create the zpool from the "36" multipath devices?

so if i create a 36 drive multipath zpool on nodeA when it fails do i just import it to nodeB
i was thinking to use carp for failover..... so nodeB would continue nfs sessions and import the zpool to nodeB


--
Devin

_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.


_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.


More information about the freebsd-questions mailing list