raidz slowing down
Solon Lutz
solon at pyro.de
Mon Oct 26 01:30:39 UTC 2009
> Did you ever get any response? I have a very similar sounding issue with
> my raidz2. I've always assumed it was because the volume was nearly full
> and maybe some fragmentation or something. All of my devices are on MPT
> controllers, so I don't think that the highpoint device is an issue.
Nope, no responses...
Since I was working on a rescue operation, I didn't have the patience
to eliminated all kinds of errors and so I swapped out da1
(maybe a little bit slow or buggy?) and used the forensics version of dd
'dcfldd'. It has a split option and I suspected that ZFS has problems when
writing huge amounts of continous data streams - so I split the 10TB in
100GB files, which took about 11 hours.
I don't know if this is general problem, or if this only happens when the
input id delivered at a much higher data-rate. In this case, the HW-RAID/zpool was
able to deliver data at 600MB/s while the RAIDZ/zpool could only write at 130MB/s.
The dynamics of this 'slow-down' that I could watch via gstat looked like the
whole access on the device level was desynchronizing completely.
In the end, before I quit the process, write-speed was down to 5MB/s !
But as I mentioned earlier, I had no nerves for bug-hunting, due to a
bigger (still unsolved) problem at hand.
Maybe somebody else likes to investigate? I'm busy with ZFS forensics...
solon
> On Thu, 8 Oct 2009, Solon Lutz wrote:
>> I built a 9x hdd 11TB raidz for some rescue purposes and started
>> copying an image from another partition via "dd if=/dev/da0..." to it.
>> It consists of: ad4 da1 da2 da3 da4 da5 da6 da7 da8, da1 to da8 are
>> connected via two highpoint controllers.
>> In the beginning write speeds were quite fair:
>> dT: 1.002s w: 1.000s
>> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
>> 0 424 0 0 0.0 424 52483 33.9 84.6| ad4
>> 0 0 0 0 0.0 0 0 0.0 0.0| da0
>> 35 356 0 0 0.0 356 44584 76.4 124.5| da1
>> 35 296 0 0 0.0 296 36919 84.5 121.0| da2
>> 34 361 0 0 0.0 361 45111 75.5 124.7| da3
>> 35 346 0 0 0.0 346 43196 78.6 123.2| da4
>> 35 344 0 0 0.0 344 42940 80.0 124.7| da5
>> 35 343 0 0 0.0 343 42812 80.7 124.5| da6
>> 35 344 0 0 0.0 344 43051 79.8 123.9| da7
>> 34 342 0 0 0.0 342 42796 80.6 124.4| da8
>> Now, some 10 hours and 2.5TB later, it look like that most of the time:
>> dT: 1.002s w: 1.000s
>> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
>> 0 10 0 0 0.0 10 6 0.8 0.2| ad4
>> 0 0 0 0 0.0 0 0 0.0 0.0| da0
>> 4 13 0 0 0.0 13 8 550.4 178.5| da1
>> 0 12 0 0 0.0 12 7 0.7 0.2| da2
>> 0 11 0 0 0.0 11 7 0.7 0.2| da3
>> 0 10 0 0 0.0 10 5 0.6 0.2| da4
>> 0 11 0 0 0.0 11 6 0.9 0.3| da5
>> 0 12 0 0 0.0 12 7 0.7 0.2| da6
>> 0 11 0 0 0.0 11 7 0.7 0.2| da7
>> 0 9 0 0 0.0 9 6 0.8 0.2| da8
>> da1 seems to be busy most of time and every few seconds all the other
>> devices write some data with nearly normal speed:
>> dT: 1.003s w: 1.000s
>> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
>> 0 254 0 0 0.0 254 31331 34.9 35.4| ad4
>> 0 0 0 0 0.0 0 0 0.0 0.0| da0
>> 4 0 0 0 0.0 0 0 0.0 0.0| da1
>> 0 254 0 0 0.0 254 31346 107.4 104.5| da2
>> 0 256 0 0 0.0 256 31345 108.1 104.0| da3
>> 0 255 0 0 0.0 255 31345 110.2 105.1| da4
>> 35 200 0 0 0.0 200 24912 143.3 115.0| da5
>> 35 211 0 0 0.0 211 26303 137.8 114.9| da6
>> 35 210 0 0 0.0 210 26079 139.3 114.9| da7
>> 35 209 0 0 0.0 209 25952 135.2 113.7| da8
>> Sometimes it even gets back to 'normal' behaviour, but never reaches
>> the speeds it once had:
>> dT: 1.002s w: 1.000s
>> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
>> 35 274 0 0 0.0 274 34334 44.2 66.6| ad4
>> 0 1166 1166 149243 0.1 0 0 0.0 14.3| da0
>> 35 120 0 0 0.0 120 14717 94.4 64.5| da1
>> 35 96 0 0 0.0 96 11665 113.9 64.3| da2
>> 35 100 0 0 0.0 100 12288 98.7 63.9| da3
>> 35 103 0 0 0.0 103 12496 93.4 59.4| da4
>> 34 112 0 0 0.0 112 13694 106.1 67.4| da5
>> 35 71 0 0 0.0 71 8596 115.3 66.8| da6
>> 35 116 0 0 0.0 116 14205 101.7 67.3| da7
>> 35 83 0 0 0.0 83 10066 112.2 65.9| da8
>> Syslog reports the following:
>> Oct 8 09:53:40 radium kernel: hptrr: start channel [0,0]
>> Oct 8 09:53:40 radium kernel: hptrr: channel [0,0] started successfully
>> Oct 8 09:57:44 radium kernel: hptrr: start channel [0,0]
>> Oct 8 09:57:45 radium kernel: hptrr: channel [0,0] started successfully
>> Oct 8 10:54:26 radium kernel: hptrr: start channel [0,0]
>> Oct 8 10:54:27 radium kernel: hptrr: channel [0,0] started successfully
>> Oct 8 11:10:29 radium kernel: hptrr: start channel [0,0]
>> Oct 8 11:10:30 radium kernel: hptrr: channel [0,0] started successfully
>> Oct 8 11:17:27 radium kernel: hptrr: start channel [0,0]
>> Oct 8 11:17:27 radium kernel: hptrr: channel [0,0] started successfully
>> Is this a problem of the hptrr device or is da1 failing?
>> Mit freundlichen Grüßen
>> Best regards,
>> Solon Lutz
>> +-----------------------------------------------+
>> | Pyro.Labs Berlin - Creativity for tomorrow |
>> | Wasgenstrasse 75/13 - 14129 Berlin, Germany |
>> | www.pyro.de - phone + 49 - 30 - 48 48 58 58 |
>> | info at pyro.de - fax + 49 - 30 - 80 94 03 52 |
>> +-----------------------------------------------+
>> _______________________________________________
>> freebsd-fs at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
More information about the freebsd-fs
mailing list