RFC: GEOM MULTIPATH rewrite

Nikolay Denev ndenev at gmail.com
Fri Jan 20 12:13:26 UTC 2012


On Jan 20, 2012, at 1:30 PM, Alexander Motin wrote:

> On 01/20/12 13:08, Nikolay Denev wrote:
>> On 20.01.2012, at 12:51, Alexander Motin<mav at freebsd.org>  wrote:
>> 
>>> On 01/20/12 10:09, Nikolay Denev wrote:
>>>> Another thing I've observed is that active/active probably only makes sense if you are accessing single LUN.
>>>> In my tests where I have 24 LUNS that form 4 vdevs in a single zpool, the highest performance was achieved
>>>> when I split the active paths among the controllers installed in the server importing the pool. (basically "gmultipath rotate $LUN" in rc.local for half of the paths)
>>>> Using active/active in this situation resulted in fluctuating performance.
>>> 
>>> How big was fluctuation? Between speed of one and all paths?
>>> 
>>> Several active/active devices without knowledge about each other with some probability will send part of requests via the same links, while ZFS itself already does some balancing between vdevs.
>>> 
>>> --
>>> Alexander Motin
>> 
>> I will test in a bit and post results.
>> 
>> P.S.: Is there a way to enable/disable active-active on the fly? I'm
>> currently re-labeling to achieve that.
> 
> No, there is not now. But for experiments you may achieve the same results by manually marking as failed all paths except one. It is not dangerous, as if that link fail, all other will resurrect automatically.
> 
> -- 
> Alexander Motin

I had to destroy and relabel anyways, since I was not using active-active currently. Here's what I did (maybe a little too verbose):

gmultipath label -A -v LD_0 /dev/da0 /dev/da24 
gmultipath label -A -v LD_1 /dev/da1 /dev/da25 
gmultipath label -A -v LD_2 /dev/da2 /dev/da26 
gmultipath label -A -v LD_3 /dev/da3 /dev/da27 
gmultipath label -A -v LD_4 /dev/da4 /dev/da28 
gmultipath label -A -v LD_5 /dev/da5 /dev/da29 
gmultipath label -A -v LD_6 /dev/da6 /dev/da30 
gmultipath label -A -v LD_7 /dev/da7 /dev/da31 
gmultipath label -A -v LD_8 /dev/da8 /dev/da32 
gmultipath label -A -v LD_9 /dev/da9 /dev/da33 
gmultipath label -A -v LD_10 /dev/da10 /dev/da34 
gmultipath label -A -v LD_11 /dev/da11 /dev/da35 
gmultipath label -A -v LD_12 /dev/da12 /dev/da36 
gmultipath label -A -v LD_13 /dev/da13 /dev/da37 
gmultipath label -A -v LD_14 /dev/da14 /dev/da38 
gmultipath label -A -v LD_15 /dev/da15 /dev/da39 
gmultipath label -A -v LD_16 /dev/da16 /dev/da40 
gmultipath label -A -v LD_17 /dev/da17 /dev/da41 
gmultipath label -A -v LD_18 /dev/da18 /dev/da42 
gmultipath label -A -v LD_19 /dev/da19 /dev/da43 
gmultipath label -A -v LD_20 /dev/da20 /dev/da44 
gmultipath label -A -v LD_21 /dev/da21 /dev/da45 
gmultipath label -A -v LD_22 /dev/da22 /dev/da46 
gmultipath label -A -v LD_23 /dev/da23 /dev/da47 

:~# gmultipath status
           Name   Status  Components
 multipath/LD_0  OPTIMAL  da0 (ACTIVE)
                          da24 (ACTIVE)
 multipath/LD_1  OPTIMAL  da1 (ACTIVE)
                          da25 (ACTIVE)
 multipath/LD_2  OPTIMAL  da2 (ACTIVE)
                          da26 (ACTIVE)
 multipath/LD_3  OPTIMAL  da3 (ACTIVE)
                          da27 (ACTIVE)
 multipath/LD_4  OPTIMAL  da4 (ACTIVE)
                          da28 (ACTIVE)
 multipath/LD_5  OPTIMAL  da5 (ACTIVE)
                          da29 (ACTIVE)
 multipath/LD_6  OPTIMAL  da6 (ACTIVE)
                          da30 (ACTIVE)
 multipath/LD_7  OPTIMAL  da7 (ACTIVE)
                          da31 (ACTIVE)
 multipath/LD_8  OPTIMAL  da8 (ACTIVE)
                          da32 (ACTIVE)
 multipath/LD_9  OPTIMAL  da9 (ACTIVE)
                          da33 (ACTIVE)
multipath/LD_10  OPTIMAL  da10 (ACTIVE)
                          da34 (ACTIVE)
multipath/LD_11  OPTIMAL  da11 (ACTIVE)
                          da35 (ACTIVE)
multipath/LD_12  OPTIMAL  da12 (ACTIVE)
                          da36 (ACTIVE)
multipath/LD_13  OPTIMAL  da13 (ACTIVE)
                          da37 (ACTIVE)
multipath/LD_14  OPTIMAL  da14 (ACTIVE)
                          da38 (ACTIVE)
multipath/LD_15  OPTIMAL  da15 (ACTIVE)
                          da39 (ACTIVE)
multipath/LD_16  OPTIMAL  da16 (ACTIVE)
                          da40 (ACTIVE)
multipath/LD_17  OPTIMAL  da17 (ACTIVE)
                          da41 (ACTIVE)
multipath/LD_18  OPTIMAL  da18 (ACTIVE)
                          da42 (ACTIVE)
multipath/LD_19  OPTIMAL  da19 (ACTIVE)
                          da43 (ACTIVE)
multipath/LD_20  OPTIMAL  da20 (ACTIVE)
                          da44 (ACTIVE)
multipath/LD_21  OPTIMAL  da21 (ACTIVE)
                          da45 (ACTIVE)
multipath/LD_22  OPTIMAL  da22 (ACTIVE)
                          da46 (ACTIVE)
multipath/LD_23  OPTIMAL  da23 (ACTIVE)
                          da47 (ACTIVE)

:~# zpool import tank
:~# zpool status
  pool: tank
 state: ONLINE
 scan: none requested
config:

	NAME                 STATE     READ WRITE CKSUM
	tank                 ONLINE       0     0     0
	  raidz2-0           ONLINE       0     0     0
	    multipath/LD_0   ONLINE       0     0     0
	    multipath/LD_1   ONLINE       0     0     0
	    multipath/LD_2   ONLINE       0     0     0
	    multipath/LD_3   ONLINE       0     0     0
	    multipath/LD_4   ONLINE       0     0     0
	    multipath/LD_5   ONLINE       0     0     0
	  raidz2-1           ONLINE       0     0     0
	    multipath/LD_6   ONLINE       0     0     0
	    multipath/LD_7   ONLINE       0     0     0
	    multipath/LD_8   ONLINE       0     0     0
	    multipath/LD_9   ONLINE       0     0     0
	    multipath/LD_10  ONLINE       0     0     0
	    multipath/LD_11  ONLINE       0     0     0
	  raidz2-2           ONLINE       0     0     0
	    multipath/LD_12  ONLINE       0     0     0
	    multipath/LD_13  ONLINE       0     0     0
	    multipath/LD_14  ONLINE       0     0     0
	    multipath/LD_15  ONLINE       0     0     0
	    multipath/LD_16  ONLINE       0     0     0
	    multipath/LD_17  ONLINE       0     0     0
	  raidz2-3           ONLINE       0     0     0
	    multipath/LD_18  ONLINE       0     0     0
	    multipath/LD_19  ONLINE       0     0     0
	    multipath/LD_20  ONLINE       0     0     0
	    multipath/LD_21  ONLINE       0     0     0
	    multipath/LD_22  ONLINE       0     0     0
	    multipath/LD_23  ONLINE       0     0     0

errors: No known data errors

And now a very naive benchmark :

:~# dd if=/dev/zero of=/tank/TEST bs=1M count=512      
512+0 records in
512+0 records out
536870912 bytes transferred in 7.282780 secs (73717855 bytes/sec)
:~# dd if=/dev/zero of=/tank/TEST bs=1M count=512
512+0 records in
512+0 records out
536870912 bytes transferred in 38.422724 secs (13972745 bytes/sec)
:~# dd if=/dev/zero of=/tank/TEST bs=1M count=512
512+0 records in
512+0 records out
536870912 bytes transferred in 10.810989 secs (49659740 bytes/sec)

Now deactivate the alternative paths :

/sbin/gmultipath fail LD_0 da24
/sbin/gmultipath fail LD_1 da25
/sbin/gmultipath fail LD_2 da26
/sbin/gmultipath fail LD_3 da27
/sbin/gmultipath fail LD_4 da28
/sbin/gmultipath fail LD_5 da29
/sbin/gmultipath fail LD_6 da6
/sbin/gmultipath fail LD_7 da7
/sbin/gmultipath fail LD_8 da8
/sbin/gmultipath fail LD_9 da9
/sbin/gmultipath fail LD_10 da10
/sbin/gmultipath fail LD_11 da11
/sbin/gmultipath fail LD_12 da36
/sbin/gmultipath fail LD_13 da37
/sbin/gmultipath fail LD_14 da38
/sbin/gmultipath fail LD_15 da39
/sbin/gmultipath fail LD_16 da40
/sbin/gmultipath fail LD_17 da41
/sbin/gmultipath fail LD_18 da18
/sbin/gmultipath fail LD_19 da19
/sbin/gmultipath fail LD_20 da20
/sbin/gmultipath fail LD_21 da21
/sbin/gmultipath fail LD_22 da22
/sbin/gmultipath fail LD_23 da23

:~# gmultipath status
           Name    Status  Components
 multipath/LD_0  DEGRADED  da0 (ACTIVE)
                           da24 (FAIL)
 multipath/LD_1  DEGRADED  da1 (ACTIVE)
                           da25 (FAIL)
 multipath/LD_2  DEGRADED  da2 (ACTIVE)
                           da26 (FAIL)
 multipath/LD_3  DEGRADED  da3 (ACTIVE)
                           da27 (FAIL)
 multipath/LD_4  DEGRADED  da4 (ACTIVE)
                           da28 (FAIL)
 multipath/LD_5  DEGRADED  da5 (ACTIVE)
                           da29 (FAIL)
 multipath/LD_6  DEGRADED  da6 (FAIL)
                           da30 (ACTIVE)
 multipath/LD_7  DEGRADED  da7 (FAIL)
                           da31 (ACTIVE)
 multipath/LD_8  DEGRADED  da8 (FAIL)
                           da32 (ACTIVE)
 multipath/LD_9  DEGRADED  da9 (FAIL)
                           da33 (ACTIVE)
multipath/LD_10  DEGRADED  da10 (FAIL)
                           da34 (ACTIVE)
multipath/LD_11  DEGRADED  da11 (FAIL)
                           da35 (ACTIVE)
multipath/LD_12  DEGRADED  da12 (ACTIVE)
                           da36 (FAIL)
multipath/LD_13  DEGRADED  da13 (ACTIVE)
                           da37 (FAIL)
multipath/LD_14  DEGRADED  da14 (ACTIVE)
                           da38 (FAIL)
multipath/LD_15  DEGRADED  da15 (ACTIVE)
                           da39 (FAIL)
multipath/LD_16  DEGRADED  da16 (ACTIVE)
                           da40 (FAIL)
multipath/LD_17  DEGRADED  da17 (ACTIVE)
                           da41 (FAIL)
multipath/LD_18  DEGRADED  da18 (FAIL)
                           da42 (ACTIVE)
multipath/LD_19  DEGRADED  da19 (FAIL)
                           da43 (ACTIVE)
multipath/LD_20  DEGRADED  da20 (FAIL)
                           da44 (ACTIVE)
multipath/LD_21  DEGRADED  da21 (FAIL)
                           da45 (ACTIVE)
multipath/LD_22  DEGRADED  da22 (FAIL)
                           da46 (ACTIVE)
multipath/LD_23  DEGRADED  da23 (FAIL)
                           da47 (ACTIVE)

And the benchmark again:

:~# dd if=/dev/zero of=/tank/TEST bs=1M count=512
512+0 records in
512+0 records out
536870912 bytes transferred in 1.083226 secs (495622270 bytes/sec)
:~# dd if=/dev/zero of=/tank/TEST bs=1M count=512
512+0 records in
512+0 records out
536870912 bytes transferred in 1.409975 secs (380766249 bytes/sec)
:~# dd if=/dev/zero of=/tank/TEST bs=1M count=512
512+0 records in
512+0 records out
536870912 bytes transferred in 1.136110 secs (472551848 bytes/sec)

P.S.: The server is running 8.2-STABLE, dual port isp(4) card, and is directly connected to a 4Gbps Xyratex dual-controller (active-active) storage array.
All the 24 SAS drives are setup as single disk RAID0 LUNs.


More information about the freebsd-geom mailing list