very strange IO issue with FreeBSD 8 and SSD
Jeremy Chadwick
freebsd at jdc.parodius.com
Mon May 2 23:36:06 UTC 2011
On Mon, May 02, 2011 at 03:28:23PM -0700, Jan Koum wrote:
> hello,
>
> we are seeing some strange activity on our FreeBSD systems running
> 8.2-PRERELEASE snapshot from early december
>
> our system has 4 Intel SSD drives (64GB each) connected directly into
> motherboard through AHCI:
>
> ad4: setting UDMA100
> ad4: 61057MB <SSDSA2SH064G1GC INTEL 045C8860> at ata2-master UDMA100 SATA
> 3Gb/s
> ad4: 125045424 sectors [124053C/16H/63S] 16 sectors/interrupt 1 depth queue
> [...]
> ad7: setting UDMA100
> ad7: 61057MB <SSDSA2SH064G1GC INTEL 045C8860> at ata3-slave UDMA100 SATA
> 3Gb/s
> ad7: 125045424 sectors [124053C/16H/63S] 16 sectors/interrupt 1 depth queue
>
> $ df -h
> Filesystem Size Used Avail Capacity Mounted on
> /dev/ad4s1a 57G 24G 29G 45% /
> /dev/ad5a 58G 17G 36G 32% /d2
> /dev/ad7a 58G 17G 36G 32% /d4
> /dev/ad6a 58G 17G 36G 32% /d3
>
> so far - so good, right? this is where things get very bizarre: our
> application receives data from network and writes to disk. on average the
> file size grows to about 7Kbytes while an average file append is 300-400
> bytes.
>
> netstat shows about 700-800Kbytes of input and our application log shows we
> write about 500Kbytes each second. however, when i run iostat i we see
> upwards of 10MB a second written to disk (if not more). for example:
>
> $ iostat -KC -x 1
> extended device statistics cpu
> device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id
> ad4 9.0 423.3 45.2 4410.1 0 84.3 11 5 0 5 1 89
> ad5 9.0 420.7 44.9 4237.4 0 82.3 11
> ad6 9.0 420.6 45.1 4254.4 0 81.1 11
> ad7 9.0 420.3 44.9 4225.7 0 83.8 11
> extended device statistics cpu
> device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id
> ad4 14.9 157.9 79.5 1108.4 0 31.7 18 8 0 5 1 86
> ad5 15.9 1480.8 63.6 18886.1 0 36.4 19
> ad6 20.9 154.9 93.4 1032.9 0 7.4 4
> ad7 19.9 216.5 63.6 1450.0 0 9.2 4
> extended device statistics cpu
> device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id
> ad4 20.9 169.2 115.4 1271.7 0 39.3 13 9 0 4 1 85
> ad5 21.9 1179.1 129.4 11598.1 0 34.6 14
> ad6 14.9 140.3 39.8 925.4 0 9.4 3
> ad7 15.9 213.9 33.8 1610.0 0 7.9 3
> extended device statistics cpu
> device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id
> ad4 15.9 403.6 53.7 3208.6 0 30.0 10 8 0 6 1 85
> ad5 16.9 709.7 47.7 4691.6 0 20.2 9
> ad6 23.9 321.1 97.4 2262.3 0 12.9 7
> ad7 14.9 421.4 51.7 3437.2 0 13.3 7
>
> (apologies in advance for bad formatting)
>
> so, here are we are, looking at iostat output and trying to figure out how
> it can be this bad and where the discrepancy is coming from. a few things
> to get out of the way: no, we do not have TRIM enabled yet, we would need to
> upgrade OS for that, but we don't think TRIM would make such a big
> different. also we know that we can newfs with -b 512 -f 4096 but again, we
> also dont think that it would account for such a large IO discrepancy.
>
> any thoughts to what this could be? has anybody seen anything similar
> before? 10MB of metadata for 500K worth of disk writes? that can't be....
> right?
I would recommend trying ahci.ko instead of ataahci.ko. Your device
names will change (ad4 --> ada0, ad5 --> ada1, etc.). Just add
ahci_load="yes" to /boot/loader.conf and reboot into single-user, fix
/etc/fstab and related configuration files, and that's all you should
have to do.
We use Intel SSDs (X25-M 80GB) in our servers, also backed by UFS2 with
softupdates. Controllers are Intel ICH7R (in AHCI mode) and Intel ICH9R
(also in AHCI mode). We *did not* apply any 4K alignment when making
the partitions. We use ahci.ko. I haven't tested write speeds and all
that, but the disks work fine.
You might also try comparing iostat output to gstat output, though gstat
refreshes the screen continually making this a little difficult.
I would recommend "gstat -I500ms -f '^ad[0-9]$' and watch closely.
Change the regex, of course, if you switch to ahci.ko.
If you want to compare benchmarks, I need to know exactly what to do to
reproduce the issue you're stating. I would prefer the traffic not come
off the network (e.g. use dd or bonnie++ or something) to rule out
problems there.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP 4BD6C0CB |
More information about the freebsd-fs
mailing list