raidz slowing down

Solon Lutz solon at pyro.de
Thu Oct 8 09:37:23 UTC 2009


I built a 9x hdd 11TB raidz for some rescue purposes and started
copying an image from another partition via "dd if=/dev/da0..." to it.
It consists of: ad4 da1 da2 da3 da4 da5 da6 da7 da8, da1 to da8 are
connected via two highpoint controllers.

In the beginning write speeds were quite fair:

dT: 1.002s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0    424      0      0    0.0    424  52483   33.9   84.6| ad4
    0      0      0      0    0.0      0      0    0.0    0.0| da0
   35    356      0      0    0.0    356  44584   76.4  124.5| da1
   35    296      0      0    0.0    296  36919   84.5  121.0| da2
   34    361      0      0    0.0    361  45111   75.5  124.7| da3
   35    346      0      0    0.0    346  43196   78.6  123.2| da4
   35    344      0      0    0.0    344  42940   80.0  124.7| da5
   35    343      0      0    0.0    343  42812   80.7  124.5| da6
   35    344      0      0    0.0    344  43051   79.8  123.9| da7
   34    342      0      0    0.0    342  42796   80.6  124.4| da8

Now, some 10 hours and 2.5TB later, it look like that most of the time:

dT: 1.002s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0     10      0      0    0.0     10      6    0.8    0.2| ad4
    0      0      0      0    0.0      0      0    0.0    0.0| da0
    4     13      0      0    0.0     13      8  550.4  178.5| da1
    0     12      0      0    0.0     12      7    0.7    0.2| da2
    0     11      0      0    0.0     11      7    0.7    0.2| da3
    0     10      0      0    0.0     10      5    0.6    0.2| da4
    0     11      0      0    0.0     11      6    0.9    0.3| da5
    0     12      0      0    0.0     12      7    0.7    0.2| da6
    0     11      0      0    0.0     11      7    0.7    0.2| da7
    0      9      0      0    0.0      9      6    0.8    0.2| da8


da1 seems to be busy most of time and every few seconds all the other
devices write some data with nearly normal speed:

dT: 1.003s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0    254      0      0    0.0    254  31331   34.9   35.4| ad4
    0      0      0      0    0.0      0      0    0.0    0.0| da0
    4      0      0      0    0.0      0      0    0.0    0.0| da1
    0    254      0      0    0.0    254  31346  107.4  104.5| da2
    0    256      0      0    0.0    256  31345  108.1  104.0| da3
    0    255      0      0    0.0    255  31345  110.2  105.1| da4
   35    200      0      0    0.0    200  24912  143.3  115.0| da5
   35    211      0      0    0.0    211  26303  137.8  114.9| da6
   35    210      0      0    0.0    210  26079  139.3  114.9| da7
   35    209      0      0    0.0    209  25952  135.2  113.7| da8

Sometimes it even gets back to 'normal' behaviour, but never reaches
the speeds it once had:

dT: 1.002s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
   35    274      0      0    0.0    274  34334   44.2   66.6| ad4
    0   1166   1166 149243    0.1      0      0    0.0   14.3| da0
   35    120      0      0    0.0    120  14717   94.4   64.5| da1
   35     96      0      0    0.0     96  11665  113.9   64.3| da2
   35    100      0      0    0.0    100  12288   98.7   63.9| da3
   35    103      0      0    0.0    103  12496   93.4   59.4| da4
   34    112      0      0    0.0    112  13694  106.1   67.4| da5
   35     71      0      0    0.0     71   8596  115.3   66.8| da6
   35    116      0      0    0.0    116  14205  101.7   67.3| da7
   35     83      0      0    0.0     83  10066  112.2   65.9| da8

Syslog reports the following:

Oct  8 09:53:40 radium kernel: hptrr: start channel [0,0]
Oct  8 09:53:40 radium kernel: hptrr: channel [0,0] started successfully
Oct  8 09:57:44 radium kernel: hptrr: start channel [0,0]
Oct  8 09:57:45 radium kernel: hptrr: channel [0,0] started successfully
Oct  8 10:54:26 radium kernel: hptrr: start channel [0,0]
Oct  8 10:54:27 radium kernel: hptrr: channel [0,0] started successfully
Oct  8 11:10:29 radium kernel: hptrr: start channel [0,0]
Oct  8 11:10:30 radium kernel: hptrr: channel [0,0] started successfully
Oct  8 11:17:27 radium kernel: hptrr: start channel [0,0]
Oct  8 11:17:27 radium kernel: hptrr: channel [0,0] started successfully

Is this a problem of the hptrr device or is da1 failing?




































Mit freundlichen Grüßen
Best regards,

Solon Lutz


+-----------------------------------------------+
| Pyro.Labs Berlin -  Creativity for tomorrow   |
| Wasgenstrasse 75/13 - 14129 Berlin, Germany   |
| www.pyro.de - phone + 49 - 30 - 48 48 58 58   |
| info at pyro.de - fax + 49 - 30 - 80 94 03 52    |
+-----------------------------------------------+



More information about the freebsd-fs mailing list