Geom stripe bottleneck
John-Mark Gurney
jmg at funkthat.com
Wed Jun 4 16:30:44 UTC 2014
Frank Broniewski wrote this message on Wed, Jun 04, 2014 at 10:38 +0200:
> thank you very much for your verbose and very helpful answer! I think
> that clears things out for me.
You're welcome...
> I've got a question concerning NCQ though:
>
> # grep ahci /var/run/dmesg.boot
> ahci0: <ATI IXP700 AHCI SATA controller> port
> 0xb000-0xb007,0xa000-0xa003,0x9000-0x9007,0x8000-0x8003,0x7000-0x700f
> mem 0xfaffe400-0xfaffe7ff irq 22 at device 17.0 on pci0
> ahci0: AHCI v1.10 with 4 3Gbps ports, Port Multiplier supported
> ahcich0: <AHCI channel> at channel 0 on ahci0
> ahcich1: <AHCI channel> at channel 1 on ahci0
> ahcich2: <AHCI channel> at channel 2 on ahci0
> ahcich3: <AHCI channel> at channel 3 on ahci0
> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
> ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
> ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
> ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
try doing a grep ada0, as mine shows:
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD30EFRX-68AX9N0 80.00A80> ATA-9 SATA 3.x device
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad0
You should probably see something similar...
> and:
>
> # camcontrol identify ada3
> pass3: <WDC WD6000HLHX-01JJPV0 04.05G04> ATA-8 SATA 3.x device
> pass3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
>
> protocol ATA/ATAPI-8 SATA 3.x
> device model WDC WD6000HLHX-01JJPV0
> firmware revision 04.05G04
> serial number WD-WXL1E61PWAL2
> WWN 50014ee7aaab0118
> cylinders 16383
> heads 16
> sectors/track 63
> sector size logical 512, physical 512, offset 0
> LBA supported 268435455 sectors
> LBA48 supported 1172123568 sectors
> PIO supported PIO4
> DMA supported WDMA2 UDMA6
> media RPM 10000
>
> Feature Support Enabled Value Vendor
> read ahead yes yes
> write cache yes yes
> flush cache yes yes
> overlap no
> Tagged Command Queuing (TCQ) no no
> Native Command Queuing (NCQ) yes 32 tags
> SMART yes yes
> microcode download yes yes
> security yes no
> power management yes yes
> advanced power management yes yes 128/0x80
> automatic acoustic management no no
> media status notification no no
> power-up in Standby yes no
> write-read-verify no no
> unload yes yes
> free-fall no no
> Data Set Management (DSM/TRIM) no
> Host Protected Area (HPA) yes no 1172123568/1172123568
> HPA - Security no
>
>
> is NCQ now enabled? The corresponding line in the camcontrol identify
> output doesn't tell me that explicitly but also doesn't deny that ...
> but the dmesg.boot may hint that the ahci module is loaded ... I'm
> confused :-)
>
> I do not have a ahci_load=YES in /boot/loader.conf (this is on FreeBSD
> 9.2-p6) and I don't know if that's still necessary or not. Searching the
> internet turned up mostly rather old (2010,2011) results.
>
>
> Am 2014-06-03 22:48, schrieb John-Mark Gurney:
> > Frank Broniewski wrote this message on Tue, Jun 03, 2014 at 11:56 +0200:
> >> I have a stripe (RAID0) geom setup for my database's data. Currently I
> >> am applying some large updates on the data and I think the performance
> >> of my stripe could be better. But I am uncertain and so I thought I'd
> >> request some interpretation help from the community :)
> >>
> >> The stripe consists of two disks (WD Velociraptor with 10.000 rpm):
> >>> diskinfo -v ada2
> >> ada2
> >> 512 # sectorsize
> >> 600127266816 # mediasize in bytes (558G)
> >> 1172123568 # mediasize in sectors
> >> 0 # stripesize
> >> 0 # stripeoffset
> >> 1162821 # Cylinders according to firmware.
> >>
> >> 16 # Heads according to firmware.
> >>
> >> 63 # Sectors according to firmware.
> >>
> >> WD-WXH1E61ASNX9 # Disk ident.
> >>
> >>
> >> and /var/log/dmesg.boot
> >> # snip
> >> ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
> >> ada2: <WDC WD6000HLHX-01JJPV0 04.05G04> ATA-8 SATA 3.x device
> >> ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> >> ada2: Command Queueing enabled
> >> ada2: 572325MB (1172123568 512 byte sectors: 16H 63S/T 16383C)
> >> ada2: Previously was known as ad8
> >> ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
> >> ada3: <WDC WD6000HLHX-01JJPV0 04.05G04> ATA-8 SATA 3.x device
> >> ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> >> ada3: Command Queueing enabled
> >> ada3: 572325MB (1172123568 512 byte sectors: 16H 63S/T 16383C)
> >> ada3: Previously was known as ad10
> >> #snap
> >>
> >>
> >> And here's some iostat -d -w 10 ada0 ada1 ada2 ada3 example output
> >> #snip
> >> ada0 ada1 ada2 ada3
> >> KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s
> >> 0.00 0 0.00 0.00 0 0.00 19.33 176 3.32 19.33 176 3.32
> >> 16.25 0 0.01 16.25 0 0.01 16.87 133 2.20 16.87 133 2.20
> >> 0.00 0 0.00 0.00 0 0.00 16.77 146 2.40 16.77 147 2.40
> >> 0.00 0 0.00 0.00 0 0.00 19.46 170 3.24 19.45 170 3.23
> >> 21.50 0 0.01 21.50 0 0.01 17.00 125 2.08 17.00 125 2.08
> >> 0.50 0 0.00 0.50 0 0.00 16.88 145 2.38 16.88 145 2.38
> >> 0.00 0 0.00 0.00 0 0.00 16.96 125 2.07 16.97 125 2.07
> >> 0.00 0 0.00 0.00 0 0.00 19.82 158 3.06 19.81 158 3.07
> >> 28.77 1 0.03 28.77 1 0.03 16.83 133 2.19 16.82 133 2.19
> >> #snap
> >
> > The key here is the tps... Spining drives have a limited number of
> > tps... first you have moving the heads, which on average will be ~4ms,
> > then you have to wait, on average half a rotation, which for a 10k RPM
> > drive is ~3ms, so each seek will take around 7ms, so, as you can see,
> > your best number is 176 TPS, or ~8ms/transaction... so, it looks like
> > your drives are performing as they should...
> >
> >> I think the MB/s output is rather low for such a disk. To gain further
> >> insight I started gstat:
> >> dT: 1.001s w: 1.000s
> >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> >> 0 27 0 0 0.0 27 2226 4.8 7.0| ada0
> >> 0 28 1 32 23.9 27 2226 1.3 3.9| ada1
> >> 2 120 115 1838 6.4 5 96 0.2 74.3| ada2
> >> 2 121 116 1854 6.3 5 96 0.4 72.9| ada3
> >> 0 28 1 32 24.0 27 2226 5.0 8.7| mirror/gm
> >> 2 121 116 3708 7.9 5 192 0.4 92.2| stripe/gs
> >> 0 28 1 32 24.0 27 2226 5.0 8.7| mirror/gms1
> >> 0 12 0 0 0.0 12 1343 9.1 6.9| mirror/gms1a
> >> 0 0 0 0 0.0 0 0 0.0 0.0| mirror/gms1b
> >> 0 0 0 0 0.0 0 0 0.0 0.0| mirror/gms1d
> >> 0 0 0 0 0.0 0 0 0.0 0.0| mirror/gms1e
> >> 0 16 1 32 24.0 15 883 1.7 2.9| mirror/gms1f
> >>
> >>
> >> What bothers me here is that the stripe/gs is 92% busy while the disks
> >> themselves are only at 74/72%. This lead me to my post here and seek
> >> some advice, since I don't know enough about the mechanics and so I
> >> can't really find the problem, if there is any at all.
> >
> > This is because the stripe has to wait for both drives to return data
> > before moving the data up... If you're just running a single threaded
> > benchmark, there isn't multiple IO's in flight, and there for the
> > remaining time is spent in your application before it sends another
> > request down to the stripe... the different between stripe and the
> > drives is the fact each of them is sometimes faster than the other,
> > so again, won't have work to do until another IO is submitted...
> >
> > Try sending more IO at it, like doing 4 or more dd read's such that
> > the between the latency of one IO, there is other IO to server...
> >
> > Also, make sure that you're using NCQ where the OS can submit multiple
> > IO's to the drives at once, this should improve things, but won't
> > change the results you see above as it requires multiple IO's
> > outstanding...
--
John-Mark Gurney Voice: +1 415 225 5579
"All that I will do, has been done, All that I have, has not."
More information about the freebsd-geom
mailing list