RFC: GEOM MULTIPATH rewrite

Fri Jan 20 08:40:59 UTC 2012

On Nov 14, 2011, at 11:09 PM, Gary Palmer wrote:

> On Tue, Nov 01, 2011 at 10:24:06PM +0200, Alexander Motin wrote:
>> On 01.11.2011 19:50, Dennis K?gel wrote:
>>> Not sure if replying on-list or off-list makes more sense...
>> 
>> Replying on-list could share experience to other users.
>> 
>>> Anyway, some first impressions, on stable/9:
>>> 
>>> The lab environment here is a EMC VNX / Clariion SAN, which has two Storage Processors, connected to different switches, connected to two isp(4)s on the test machine. So at any time, the machine sees four paths, but only two are available (depending on which SP owns the LUN).
>>> 
>>> 580# camcontrol devlist
>>> <DGC VRAID 0531>                   at scbus0 target 0 lun 0 (da0,pass0)
>>> <DGC VRAID 0531>                   at scbus0 target 1 lun 0 (da1,pass1)
>>> <DGC VRAID 0531>                   at scbus1 target 0 lun 0 (da2,pass2)
>>> <DGC VRAID 0531>                   at scbus1 target 1 lun 0 (da3,pass3)
>>> <COMPAQ RAID 1(1VOLUME OK>         at scbus2 target 0 lun 0 (da4,pass4)
>>> <COMPAQ RAID 0  VOLUME OK>         at scbus2 target 1 lun 0 (da5,pass5)
>>> <hp DVD D  DS8D3SH HHE7>           at scbus4 target 0 lun 0 (cd0,pass6)
>>> 
>>> I miss the ability to "add" disks to automatic mode multipaths, but I (just now) realized this only makes sense when gmultipath has some kind of path checking facility (like periodically trying to read sector 0 of each configured device, this is was Linux' devicemapper-multipathd does).
>> 
>> In automatic mode other paths supposed to be detected via metadata
>> reading. If in your case some paths are not readable, automatic mode
>> can't work as expected. By the way, could you describe how your
>> configuration supposed to work, like when other paths will start
>> working? 
> 
> Without knowledge of the particular Clariion SAN Dennis is working with,
> I've seen some so-called active/active RAID controllers force a LUN 
> fail over from one controller to another (taking it offline for 3 seconds
> in the process) because the LUN received an I/O down a path to the controller
> that was formerly taking the standby role for that LUN (and it was per-LUN,
> so some would be owned by one controller and some by the other).  During
> the controller switch, all I/O to the LUN would fail.  Thankfully that
> particular RAID model where I observed this behaviour hasn't been sold in
> several years, but I would tend to expect such behaviour at the lower
> end of the storage market with the higher end units doing true active/active
> configurations. (and no, I won't name the manufacturer on a public list)
> 
> This is exactly why Linux ships with a multipath configuration file, so
> it can describe exactly what form of brain damage the controller in
> question implements so it can work around it, and maybe even 
> document some vendor-specific extensions so that the host can detect
> which controller is taking which role for a particular path.
> 
> Even some controllers that don't have pathological behaviour when
> they receive I/O down the wrong path have sub-optimal behaviour unless
> you choose the right path.  NetApp SANs in particular typically have two
> independant controllers with a high-speed internal interconnect, however
> there is a measurable and not-insignificant penalty for sending the I/O
> to the "partner" controller for a LUN, across the internal interconnect
> (called a "VTIC" I believe) to the "owner" controller.  I've been told,
> although I have not measured this myself, that it can add several ms to
> a transaction, which when talking about SAN storage is potentially several
> times what it takes to do the same I/O directly to the controller that
> owns it.  There's probably a way to make the "partner" controller not
> advertise the LUN until it takes over in a failover scenario, but every
> NetApp I've worked with is set (by default I believe) to advertise the
> LUN out both controllers.
> 
> Gary
> _______________________________________________
> freebsd-current at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"

Another thing I've observed is that active/active probably only makes sense if you are accessing single LUN.
In my tests where I have 24 LUNS that form 4 vdevs in a single zpool, the highest performance was achieved
when I split the active paths among the controllers installed in the server importing the pool. (basically "gmultipath rotate $LUN" in rc.local for half of the paths)
Using active/active in this situation resulted in fluctuating performance.