Apparent strange disk behaviour in 6.0
Julian Elischer
julian at elischer.org
Thu Jul 28 06:54:50 GMT 2005
I've been playing around with some raid arrays.
I've notived some odd things.
firstly on a 2+HTT (i.e 4virtual) CPU system with one SCSI array and an
ATA drive,
copying data from the ATA drive to the SCSI array seems to be a slower
than it was on 4.x.
secondly
systat -vmstat never shows either of the drives as being 100% busy.
teh most I've seen is the ATA drive being 70% busy.
for example this is theoretically a disk IO bound system but:
Disks ad0 da0 pass0 pass1 pass2
KB/t 19.40 11.68 0.00 0.00 0.00
tps 440 539 0 0 0
MB/s 8.34 6.14 0.00 0.00 0.00
% busy 48 50 0 0 0
I don't know how reliable that is however.
I HAVE noticed however that the sum of the busy percents for the two
drives seems to always
be less that 110%. If one goes up then the other goes down. Not knowing
how these numbers
are calculated, it's hard to know whether that means anything.
Physically looking at the array, the disks spend a LOT of time doing
nothing.
The array controller is obviously clustering the writes and seems to be
writing
them out every 2 seconds but the disks are only busy for about 1/4 of
that time.
I don't know how reliable that is however as an indication but whatever
the bottle neck
is it's not the drives.
The array controller is reporting back that it hardly ever has a queue
of more
than 1 thing to do, even though tags are set to 253 (occasionally th
controller will report it has 20 to do
but the next instant it's caught up again)
I plan on net booting the same machine on 4.11 again and doing the same
tests.
If I REALLY get the disks 100% busy by doing:
dd if=/dev/zero of=/raid1/bigfile bs=128k count=1000000,
then the system becomes so unresponsive that it takes about 10 seconds
for a ^C to get through to stop the dd.
a systat -vmstat
running at the same time on another window slows down and then just updates
every now and then. At no stage however does it show anything getting
close to 100% of cpu time.
interrupt time is at about 15% and system time at anout 20%.
The odd thing is that a tip talking to the raid controller
continues to sho resposive behavior, continuing to update the raid stats
page.
and the network seems to be bringing those to me just fine so teh com
ports and
the network are at least able to function, even if everything else
seizes up.
iostat sometimes continues to run
and this is what it showed during one section where the rest of teh
system seemd pretty unresponsive:
tty ad0 da0 pass0 cpu
tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id
53 79 6.13 13 0.08 16.68 1746 28.44 0.00 0 0.00 0 0 22 5 73
604 836 6.00 2 0.01 16.00 1749 27.32 0.00 0 0.00 0 0 28 5 67
168 240 7.97 31 0.24 128.00 40 4.96 0.00 0 0.00 0 0 27 9 64
173 251 11.27 11 0.12 16.00 3047 47.61 0.00 0 0.00 0 0 30 5 65
222 299 12.93 46 0.58 21.72 2092 44.37 0.00 0 0.00 0 0 34 5 60
225 302 13.29 34 0.44 128.00 39 4.87 0.00 0 0.00 0 0 40 16 43
172 250 6.82 34 0.23 30.45 217 6.44 0.00 0 0.00 0 0 52 15 33
191 268 6.22 9 0.05 16.72 1559 25.44 0.00 0 0.00 0 0 18 3 80
200 278 10.45 31 0.32 18.78 1007 18.46 0.00 0 0.00 0 0 54 11 34
192 270 12.00 1 0.01 16.00 2827 44.18 0.00 0 0.00 0 0 34 6 59
213 728 8.80 40 0.34 18.68 1225 22.34 0.00 0 0.00 0 0 42 11 47
201 250 10.29 11 0.11 128.00 3 0.41 0.00 0 0.00 0 0 22 5 74
186 281 8.20 37 0.30 125.85 49 5.98 0.00 0 0.00 0 0 33 11 56
225 302 4.00 3 0.01 16.00 2977 46.52 0.00 0 0.00 0 0 29 4 66
I'm guessing that there may be a red-hot mutex somewhere in the kernel..
not sure what though..
More information about the freebsd-current
mailing list