A specific example of a disk i/o problem

Thomas Backman serenity at exscape.org
Mon Oct 5 09:02:53 UTC 2009


(Note: I hope this reply shows up correctly. I subscribed to the  
mailing list after the fact and had to "forge" the subject line.)

Hey everyone,
I'm having serious trouble with the same thing, and just found this  
thread while looking for the correct place to post. Looks like I found  
it. (I wrote most of the post before finding the thread, so some of it  
will seem a bit odd.)

I run 8.0-RC1/amd64 with ZFS on an A64 3200+ with 2GB RAM and an old  
80GB 7200rpm disk.

My problem is that I get completely unacceptable latency on console IO  
(both via SSH and serial console) when the system is performing disk  
IO. The worst case I've noticed yet was when I tried copying a core  
dump from a lzjb compressed ZFS file system to a gzip-9 compressed  
one, to compare the file size/compression ratio. screen(1) took at  
LEAST ten seconds - probably a bit more - I'm not exaggerating here -  
to switch to another window, and an "ls" in an empty directory also  
about 5-10 seconds.
Doing some silly CPU load with two instances of "yes >/dev/null" (on a  
single core, remember) doesn't change anything, the system remains  
very responsive. "cat /dev/zero > /uncompressed_fs/..." however  
produces the extreme slowdown. (On a gzip-1 FS it doesn't, since the  
file ends up extremely small - a kilobyte or so - even after a while,  
thus performing minimal IO).

I'm thinking about switching to FreeBSD on my beefier "production"  
system (dual-core amd64, 4GB RAM, 4x1TB disks, compared to this one,  
single-core, 2GB RAM, 80GB disk), but unless I feel assured this won't  
happen there as well, I'm not so sure anymore. I can do any kind of  
heavy IO/compilation/whatever on that box, currently running Linux,  
and it's always unnoticable. In this case it's impossible *not* to  
notice that your key input is lagging behind 5-10 seconds... I thought  
multiple times that the box must have panicked.
I do realize that the hardware isn't the best, especially the disks,  
but this is far worse than it should be!

Here's some of the testing done in this thread (or at least something  
like it):

[root at chaos ~]# time file /etc/* >/dev/null
real    0m1.725s
user    0m0.993s
sys     0m0.021s
[root at chaos ~]# time file /etc/* >/dev/null

real    0m1.008s
user    0m0.990s
sys     0m0.015s
[root at chaos ~]# time file /etc/* >/dev/null

real    0m1.008s
user    0m0.967s
sys     0m0.038s
[root at chaos ~]# time file /etc/* >/dev/null

real    0m1.015s
user    0m0.998s
sys     0m0.008s

So, pretty much exactly 1 second every time once the cache is warmed  
up. Now, let's try it 10 seconds after starting heavy disk writing...

[root at chaos ~]# cat /dev/zero > /DELETE_ME &
(wait for 10 seconds)
[root at chaos ~]# time file /etc/* >/dev/null

real    0m13.217s
user    0m1.004s
sys     0m0.023s
[root at chaos ~]# time file /etc/* >/dev/null

real    0m24.012s
user    0m1.020s
sys     0m0.008s

... the numbers speak for themselves. FWIW I tried the same on the  
Linux box: "file" took 0.8 seconds the first time, 0.125s subsequent  
times. While running cat, 0.13-0.21 seconds (0.21 being the first,  
subsequent runs took 0.13s). The system remained completely  
responsive, which cannot be said for the FreeBSD one!

Any advice? Info below - please ask if you need more!

Regards,
Thomas

-----------------------------------------------------------------------------

Basic info:

[root at chaos ~]# uname -a
FreeBSD chaos.exscape.org 8.0-RC1 FreeBSD 8.0-RC1 #1 r197613M: Tue Sep  
29 15:04:44 CEST 2009     root at chaos.exscape.org:/usr/obj/usr/src/sys/ 
DTRACE  amd64
(KDB/DDB enabled, WITNESS/INVARIANTS *disabled*)

[root at chaos ~]# mount
tank/root on / (zfs, local, noatime)
devfs on /dev (devfs, local, multilabel)
/dev/ad0s1a on /bootdir (ufs, local, soft-updates)
tank/export on /export (zfs, NFS exported, local, noatime)
tank/tmp on /tmp (zfs, local, noatime)
tank/usr on /usr (zfs, local, noatime)
tank/usr/obj on /usr/obj (zfs, local, noatime)
tank/usr/ports on /usr/ports (zfs, local, noatime)
tank/usr/ports/distfiles on /usr/ports/distfiles (zfs, local, noatime)
tank/usr/src on /usr/src (zfs, local, noatime)
tank/usr/src_r196905 on /usr/src_r196905 (zfs, local, noatime, read- 
only)
tank/var on /var (zfs, local, noatime)
tank/var/crash on /var/crash (zfs, local, noatime)
tank/var/log on /var/log (zfs, local, noatime)
tank/var/tmp on /var/tmp (zfs, local, noatime)

dmesg:

-----------------------------------------------------------------------------
Copyright (c) 1992-2009 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.0-RC1 #1 r197613M: Tue Sep 29 15:04:44 CEST 2009
     root at chaos.exscape.org:/usr/obj/usr/src/sys/DTRACE
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) 64 Processor 3200+ (2009.27-MHz K8-class CPU)
   Origin = "AuthenticAMD"  Id = 0x10ff0  Stepping = 0
    
Features 
= 
0x78bfbff 
< 
FPU 
,VME 
,DE 
,PSE 
,TSC 
,MSR 
,PAE 
,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
   AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!>
   AMD Features2=0x1<LAHF>
real memory  = 2147483648 (2048 MB)
avail memory = 2052362240 (1957 MB)
ACPI APIC Table: <Nvidia AWRDACPI>
ioapic0 <Version 1.1> irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0: <Nvidia AWRDACPI> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, 7fef0000 (3) failed
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pci0: <memory> at device 0.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
pci0: <serial bus, SMBus> at device 1.1 (no driver attached)
ohci0: <OHCI (generic) USB controller> mem 0xfe02f000-0xfe02ffff irq  
21 at device 2.0 on pci0
ohci0: [ITHREAD]
usbus0: <OHCI (generic) USB controller> on ohci0
ehci0: <NVIDIA nForce4 USB 2.0 controller> mem 0xfe02e000-0xfe02e0ff  
irq 22 at device 2.1 on pci0
ehci0: [ITHREAD]
usbus1: EHCI version 1.0
usbus1: <NVIDIA nForce4 USB 2.0 controller> on ehci0
atapci0: <nVidia nForce CK804 UDMA133 controller> port  
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfb00-0xfb0f at device 6.0 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
atapci1: <nVidia nForce CK804 SATA300 controller> port  
0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xf600-0xf60f mem  
0xfe02b000-0xfe02bfff irq 23 at device 7.0 on pci0
atapci1: [ITHREAD]
ata2: <ATA channel 0> on atapci1
ata2: [ITHREAD]
ata3: <ATA channel 1> on atapci1
ata3: [ITHREAD]
atapci2: <nVidia nForce CK804 SATA300 controller> port  
0x9e0-0x9e7,0xbe0-0xbe3,0x960-0x967,0xb60-0xb63,0xf100-0xf10f mem  
0xfe02a000-0xfe02afff irq 21 at device 8.0 on pci0
atapci2: [ITHREAD]
ata4: <ATA channel 0> on atapci2
ata4: [ITHREAD]
ata5: <ATA channel 1> on atapci2
ata5: [ITHREAD]
pcib1: <ACPI PCI-PCI bridge> at device 9.0 on pci0
pci1: <ACPI PCI bus> on pcib1
vgapci0: <VGA-compatible display> mem 0xfcff8000-0xfcffbfff, 
0xfd000000-0xfd7fffff,0xfc000000-0xfc7fffff irq 17 at device 7.0 on pci1
xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xdf00-0xdf7f mem  
0xfcfff000-0xfcfff07f irq 18 at device 9.0 on pci1
miibus0: <MII bus> on xl0
xlphy0: <3c905C 10/100 internal PHY> PHY 24 on miibus0
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
xl0: Ethernet address: 00:50:da:44:c0:4a
xl0: [ITHREAD]
nfe0: <NVIDIA nForce4 CK804 MCP9 Networking Adapter> port  
0xf000-0xf007 mem 0xfe029000-0xfe029fff irq 22 at device 10.0 on pci0
miibus1: <MII bus> on nfe0
e1000phy0: <Marvell 88E1111 Gigabit PHY> PHY 1 on miibus1
e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,  
1000baseT-FDX, auto
nfe0: Ethernet address: 00:13:d3:a2:aa:0f
nfe0: [FILTER]
pcib2: <ACPI PCI-PCI bridge> at device 11.0 on pci0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> at device 12.0 on pci0
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> at device 13.0 on pci0
pci4: <ACPI PCI bus> on pcib4
pcib5: <ACPI PCI-PCI bridge> at device 14.0 on pci0
pci5: <ACPI PCI bus> on pcib5
amdtemp0: <AMD K8 Thermal Sensors> on hostb3
acpi_tz0: <Thermal Zone> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x73 irq 8 on acpi0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: [FILTER]
uart0: console (115200,n,8,1)
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
cpu0: <ACPI CPU> on acpi0
powernow0: <Cool`n'Quiet K8> on cpu0
device_attach: powernow0 attach returned 6
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcbfff, 
0xcc000-0xcc7ff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on  
isa0
ZFS NOTICE: system has less than 4GB and prefetch enable is not set...  
disabling.
ZFS filesystem version 13
ZFS storage pool version 13
Timecounter "TSC" frequency 2009269513 Hz quality 800
Timecounters tick every 1.000 msec
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 480Mbps High Speed USB v2.0
ad0: 76318MB <Seagate ST380021A 3.19> at ata0-master UDMA100
ugen0.1: <nVidia> at usbus0
uhub0: <nVidia OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on  
usbus0
ugen1.1: <nVidia> at usbus1
uhub1: <nVidia EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on  
usbus1
ad2: 9768MB <Seagate ST310014A 3.09> at ata1-master UDMA100
GEOM: ad2s1: geometry does not match label (255h,63s != 16h,63s).
Root mount waiting for: usbus1 usbus0
uhub0: 10 ports with 10 removable, self powered
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
uhub1: 10 ports with 10 removable, self powered
Trying to mount root from zfs:tank/root
netsmb_dev: loaded

-----------------------------------------------------------------------------
The 80GB disk is used for everything except swap (aka. dump device)  
for which the 10GB disk is used, exclusively.

loader.conf has nothing of value (just loading a few modules and a  
vfs.root.mountfrom, plus serial console settings).


More information about the freebsd-performance mailing list