Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)

Steven Hartland killing at multiplay.co.uk
Mon Oct 12 21:41:14 UTC 2009


We're not running 8 yet but we do have a 7.x box which its under fairly
high IO load doing mrtg graphs which has similar behaviour. When typing
a command on ssh it will freeze for may seconds. I even went to far as
to write a little C app which just prints out the time to screen and even
that sees the big delay.

Its always been like and I've never managed to get to the bottom of it,
there's something in the IO / disk subsystem which can totally lock up
the system under high IO load.

    Regards
    Steve

----- Original Message ----- 
From: "Thomas Backman" <serenity at exscape.org>
To: "freebsd-stable" <freebsd-stable at freebsd.org>
Sent: Monday, October 12, 2009 8:48 PM
Subject: Extreme console latency during disk IO (8.0-RC1,previous releases also affected according to others)


> I'm copying this over from the freebsd-performance list, as I'm  looking for a few more opinions - not on the problems *I* am 
> having,  but rather to check whether the problem is universal or not, and if  not, find a possible common factor.
> In other words: I want to hear about your experiences, *good or bad*!
>
> Here's the original thread (not from the beginning, though): 
> http://lists.freebsd.org/pipermail/freebsd-performance/2009-October/003843.html
>
> Long story short, my version: when the disk is stressed hard enough,  console IO becomes COMPLETELY unbearable. 10+ seconds to 
> switch  between windows in screen(1), running (or even typing) simple  commands, etc. This happens both via SSH and the serial 
> console.
>
> How to reproduce/test:
> 1) time file /etc/* > /dev/null a few times, or something similar that  uses the disk; write down a 
> common/average/median/whatever time.
> 2) cat /dev/zero > /uncompressed_fs/filename # please make *sure* it's  uncompressed, since ZFS with lzjb/gzip enabled will 
> squish this into a  kilobyte-sized file, thus creating virtually *no* IO.
> 3) When cat has been running say 10 seconds, re-time command #1 and do  some interactive stuff - run commands, edit files, etc.
>
> I couldn't actually reproduce the *completely* horrific increase in  latency I posted about below just now (I did update my 
> sources and  rebuild, but I'm pretty sure the delta between ~Sep 29 and Oct 6 had  no major IO changes in 8-STABLE), and the 
> "time file /etc/*" test  "only" jumped by about 3x (compared to 20-60x+ previously), but it's  still bad enough: commands such 
> as "ls" and "w" take 2-3 seconds to  run, as opposed to 0.005s for ls without the added IO... On Linux, the  increase in latency 
> is closer to 4%. A bit better than, oh, 400  times. ;)
>
> Oh, and again: this post is not a complaint; this is a post asking for  your experiences. I know I'm not alone in having these 
> issues - I just  want to know if there are a lot of people that *don't* too, and what  could cause them. I can't possibly switch 
> to FreeBSD in production  with this behaviour - and I've been looking forward to doing so for  quite a while now.
>
> Regards,
> Thomas
>
> PS.
>
> I'll leave my post to the original discussion below. (I don't usually  top post, but I don't consider this a reply, more of a 
> new post with  an addition below.)
>
> On Oct 5, 2009, at 10:45 AM, Thomas Backman wrote:
>
>> Hey everyone,
>> I'm having serious trouble with the same thing, and just found this  thread while looking for the correct place to post. Looks 
>> like I  found it. (I wrote most of the post before finding the thread, so  some of it will seem a bit odd.)
>>
>> I run 8.0-RC1/amd64 with ZFS on an A64 3200+ with 2GB RAM and an old  80GB 7200rpm disk.
>>
>> My problem is that I get completely unacceptable latency on console  IO (both via SSH and serial console) when the system is 
>> performing  disk IO. The worst case I've noticed yet was when I tried copying a  core dump from a lzjb compressed ZFS file 
>> system to a gzip-9  compressed one, to compare the file size/compression ratio. screen (1) took at LEAST ten seconds - probably 
>> a bit more - I'm not  exaggerating here - to switch to another window, and an "ls" in an  empty directory also about 5-10 
>> seconds.
>> Doing some silly CPU load with two instances of "yes >/dev/null" (on  a single core, remember) doesn't change anything, the 
>> system remains  very responsive. "cat /dev/zero > /uncompressed_fs/..." however  produces the extreme slowdown. (On a gzip-1 FS 
>> it doesn't, since the  file ends up extremely small - a kilobyte or so - even after a  while, thus performing minimal IO).
>>
>> I'm thinking about switching to FreeBSD on my beefier "production"  system (dual-core amd64, 4GB RAM, 4x1TB disks, compared to 
>> this one,  single-core, 2GB RAM, 80GB disk), but unless I feel assured this  won't happen there as well, I'm not so sure 
>> anymore. I can do any  kind of heavy IO/compilation/whatever on that box, currently running  Linux, and it's always 
>> unnoticable. In this case it's impossible  *not* to notice that your key input is lagging behind 5-10  seconds... I thought 
>> multiple times that the box must have panicked.
>> I do realize that the hardware isn't the best, especially the disks,  but this is far worse than it should be!
>>
>> Here's some of the testing done in this thread (or at least  something like it):
>>
>> [root at chaos ~]# time file /etc/* >/dev/null
>> real    0m1.725s
>> user    0m0.993s
>> sys     0m0.021s
>> [root at chaos ~]# time file /etc/* >/dev/null
>>
>> real    0m1.008s
>> user    0m0.990s
>> sys     0m0.015s
>> [root at chaos ~]# time file /etc/* >/dev/null
>>
>> real    0m1.008s
>> user    0m0.967s
>> sys     0m0.038s
>> [root at chaos ~]# time file /etc/* >/dev/null
>>
>> real    0m1.015s
>> user    0m0.998s
>> sys     0m0.008s
>>
>> So, pretty much exactly 1 second every time once the cache is warmed  up. Now, let's try it 10 seconds after starting heavy 
>> disk writing...
>>
>> [root at chaos ~]# cat /dev/zero > /DELETE_ME &
>> (wait for 10 seconds)
>> [root at chaos ~]# time file /etc/* >/dev/null
>>
>> real    0m13.217s
>> user    0m1.004s
>> sys     0m0.023s
>> [root at chaos ~]# time file /etc/* >/dev/null
>>
>> real    0m24.012s
>> user    0m1.020s
>> sys     0m0.008s
>>
>> ... the numbers speak for themselves. FWIW I tried the same on the  Linux box: "file" took 0.8 seconds the first time, 0.125s 
>> subsequent  times. While running cat, 0.13-0.21 seconds (0.21 being the first,  subsequent runs took 0.13s). The system 
>> remained completely  responsive, which cannot be said for the FreeBSD one!
>>
>> Any advice? Info below - please ask if you need more!
>>
>> Regards,
>> Thomas
>>
>> -----------------------------------------------------------------------------
>>
>> Basic info:
>>
>> [root at chaos ~]# uname -a
>> FreeBSD chaos.exscape.org 8.0-RC1 FreeBSD 8.0-RC1 #1 r197613M: Tue  Sep 29 15:04:44 CEST 2009 
>> root at chaos.exscape.org:/usr/obj/usr/ src/sys/DTRACE  amd64
>> (KDB/DDB enabled, WITNESS/INVARIANTS *disabled*)
>>
>> [root at chaos ~]# mount
>> tank/root on / (zfs, local, noatime)
>> devfs on /dev (devfs, local, multilabel)
>> /dev/ad0s1a on /bootdir (ufs, local, soft-updates)
>> tank/export on /export (zfs, NFS exported, local, noatime)
>> tank/tmp on /tmp (zfs, local, noatime)
>> tank/usr on /usr (zfs, local, noatime)
>> tank/usr/obj on /usr/obj (zfs, local, noatime)
>> tank/usr/ports on /usr/ports (zfs, local, noatime)
>> tank/usr/ports/distfiles on /usr/ports/distfiles (zfs, local, noatime)
>> tank/usr/src on /usr/src (zfs, local, noatime)
>> tank/usr/src_r196905 on /usr/src_r196905 (zfs, local, noatime, read- only)
>> tank/var on /var (zfs, local, noatime)
>> tank/var/crash on /var/crash (zfs, local, noatime)
>> tank/var/log on /var/log (zfs, local, noatime)
>> tank/var/tmp on /var/tmp (zfs, local, noatime)
>>
>> dmesg:
>>
>> -----------------------------------------------------------------------------
>> Copyright (c) 1992-2009 The FreeBSD Project.
>> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993,  1994
>> The Regents of the University of California. All rights reserved.
>> FreeBSD is a registered trademark of The FreeBSD Foundation.
>> FreeBSD 8.0-RC1 #1 r197613M: Tue Sep 29 15:04:44 CEST 2009
>>    root at chaos.exscape.org:/usr/obj/usr/src/sys/DTRACE
>> Timecounter "i8254" frequency 1193182 Hz quality 0
>> CPU: AMD Athlon(tm) 64 Processor 3200+ (2009.27-MHz K8-class CPU)
>>  Origin = "AuthenticAMD"  Id = 0x10ff0  Stepping = 0
>>   Features = 0x78bfbff < FPU ,VME ,DE ,PSE ,TSC ,MSR ,PAE ,MCE 
>> ,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
>>  AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!>
>>  AMD Features2=0x1<LAHF>
>> real memory  = 2147483648 (2048 MB)
>> avail memory = 2052362240 (1957 MB)
>> ACPI APIC Table: <Nvidia AWRDACPI>
>> ioapic0 <Version 1.1> irqs 0-23 on motherboard
>> kbd1 at kbdmux0
>> acpi0: <Nvidia AWRDACPI> on motherboard
>> acpi0: [ITHREAD]
>> acpi0: Power Button (fixed)
>> acpi0: reservation of 0, a0000 (3) failed
>> acpi0: reservation of 100000, 7fef0000 (3) failed
>> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
>> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0
>> acpi_button0: <Power Button> on acpi0
>> pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
>> pci0: <ACPI PCI bus> on pcib0
>> pci0: <memory> at device 0.0 (no driver attached)
>> isab0: <PCI-ISA bridge> at device 1.0 on pci0
>> isa0: <ISA bus> on isab0
>> pci0: <serial bus, SMBus> at device 1.1 (no driver attached)
>> ohci0: <OHCI (generic) USB controller> mem 0xfe02f000-0xfe02ffff irq  21 at device 2.0 on pci0
>> ohci0: [ITHREAD]
>> usbus0: <OHCI (generic) USB controller> on ohci0
>> ehci0: <NVIDIA nForce4 USB 2.0 controller> mem 0xfe02e000-0xfe02e0ff  irq 22 at device 2.1 on pci0
>> ehci0: [ITHREAD]
>> usbus1: EHCI version 1.0
>> usbus1: <NVIDIA nForce4 USB 2.0 controller> on ehci0
>> atapci0: <nVidia nForce CK804 UDMA133 controller> port  0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfb00-0xfb0f at device 6.0 on 
>> pci0
>> ata0: <ATA channel 0> on atapci0
>> ata0: [ITHREAD]
>> ata1: <ATA channel 1> on atapci0
>> ata1: [ITHREAD]
>> atapci1: <nVidia nForce CK804 SATA300 controller> port  0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xf600-0xf60f mem 
>> 0xfe02b000-0xfe02bfff irq 23 at device 7.0 on pci0
>> atapci1: [ITHREAD]
>> ata2: <ATA channel 0> on atapci1
>> ata2: [ITHREAD]
>> ata3: <ATA channel 1> on atapci1
>> ata3: [ITHREAD]
>> atapci2: <nVidia nForce CK804 SATA300 controller> port  0x9e0-0x9e7,0xbe0-0xbe3,0x960-0x967,0xb60-0xb63,0xf100-0xf10f mem 
>> 0xfe02a000-0xfe02afff irq 21 at device 8.0 on pci0
>> atapci2: [ITHREAD]
>> ata4: <ATA channel 0> on atapci2
>> ata4: [ITHREAD]
>> ata5: <ATA channel 1> on atapci2
>> ata5: [ITHREAD]
>> pcib1: <ACPI PCI-PCI bridge> at device 9.0 on pci0
>> pci1: <ACPI PCI bus> on pcib1
>> vgapci0: <VGA-compatible display> mem 0xfcff8000-0xfcffbfff, 0xfd000000-0xfd7fffff,0xfc000000-0xfc7fffff irq 17 at device 7.0 
>> on  pci1
>> xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xdf00-0xdf7f mem  0xfcfff000-0xfcfff07f irq 18 at device 9.0 on pci1
>> miibus0: <MII bus> on xl0
>> xlphy0: <3c905C 10/100 internal PHY> PHY 24 on miibus0
>> xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
>> xl0: Ethernet address: 00:50:da:44:c0:4a
>> xl0: [ITHREAD]
>> nfe0: <NVIDIA nForce4 CK804 MCP9 Networking Adapter> port  0xf000-0xf007 mem 0xfe029000-0xfe029fff irq 22 at device 10.0 on 
>> pci0
>> miibus1: <MII bus> on nfe0
>> e1000phy0: <Marvell 88E1111 Gigabit PHY> PHY 1 on miibus1
>> e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX,  1000baseT, 1000baseT-FDX, auto
>> nfe0: Ethernet address: 00:13:d3:a2:aa:0f
>> nfe0: [FILTER]
>> pcib2: <ACPI PCI-PCI bridge> at device 11.0 on pci0
>> pci2: <ACPI PCI bus> on pcib2
>> pcib3: <ACPI PCI-PCI bridge> at device 12.0 on pci0
>> pci3: <ACPI PCI bus> on pcib3
>> pcib4: <ACPI PCI-PCI bridge> at device 13.0 on pci0
>> pci4: <ACPI PCI bus> on pcib4
>> pcib5: <ACPI PCI-PCI bridge> at device 14.0 on pci0
>> pci5: <ACPI PCI bus> on pcib5
>> amdtemp0: <AMD K8 Thermal Sensors> on hostb3
>> acpi_tz0: <Thermal Zone> on acpi0
>> atrtc0: <AT realtime clock> port 0x70-0x73 irq 8 on acpi0
>> uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on  acpi0
>> uart0: [FILTER]
>> uart0: console (115200,n,8,1)
>> atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
>> atkbd0: <AT Keyboard> irq 1 on atkbdc0
>> kbd0 at atkbd0
>> atkbd0: [GIANT-LOCKED]
>> atkbd0: [ITHREAD]
>> cpu0: <ACPI CPU> on acpi0
>> powernow0: <Cool`n'Quiet K8> on cpu0
>> device_attach: powernow0 attach returned 6
>> orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcbfff, 0xcc000-0xcc7ff on isa0
>> sc0: <System console> at flags 0x100 on isa0
>> sc0: VGA <16 virtual consoles, flags=0x300>
>> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on  isa0
>> ZFS NOTICE: system has less than 4GB and prefetch enable is not  set... disabling.
>> ZFS filesystem version 13
>> ZFS storage pool version 13
>> Timecounter "TSC" frequency 2009269513 Hz quality 800
>> Timecounters tick every 1.000 msec
>> usbus0: 12Mbps Full Speed USB v1.0
>> usbus1: 480Mbps High Speed USB v2.0
>> ad0: 76318MB <Seagate ST380021A 3.19> at ata0-master UDMA100
>> ugen0.1: <nVidia> at usbus0
>> uhub0: <nVidia OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on  usbus0
>> ugen1.1: <nVidia> at usbus1
>> uhub1: <nVidia EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on  usbus1
>> ad2: 9768MB <Seagate ST310014A 3.09> at ata1-master UDMA100
>> GEOM: ad2s1: geometry does not match label (255h,63s != 16h,63s).
>> Root mount waiting for: usbus1 usbus0
>> uhub0: 10 ports with 10 removable, self powered
>> Root mount waiting for: usbus1
>> Root mount waiting for: usbus1
>> Root mount waiting for: usbus1
>> uhub1: 10 ports with 10 removable, self powered
>> Trying to mount root from zfs:tank/root
>> netsmb_dev: loaded
>>
>> -----------------------------------------------------------------------------
>> The 80GB disk is used for everything except swap (aka. dump device)  for which the 10GB disk is used, exclusively.
>>
>> loader.conf has nothing of value (just loading a few modules and a  vfs.root.mountfrom, plus serial console settings).
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
> 


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster at multiplay.co.uk.



More information about the freebsd-stable mailing list