kern/73740: [Panic] & [nfs(?)] 5-3-R#3 panic when accessing nfs exported ATA drives in Idle or suspend mode.

Heikki Soerum heikkis at matnat.uio.no
Tue Nov 9 10:30:21 PST 2004


>Number:         73740
>Category:       kern
>Synopsis:       [Panic] & [nfs(?)] 5-3-R#3 panic when accessing nfs exported ATA drives in Idle or suspend mode.
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Nov 09 18:30:20 GMT 2004
>Closed-Date:
>Last-Modified:
>Originator:     Heikki Soerum
>Release:        5.3-RELEASE #3
>Organization:
>Environment:
FreeBSD a-ko 5.3-RELEASE FreeBSD 5.3-RELEASE #3: Tue Nov  9 11:07:05 CET 2004     root at a-ko:/usr/obj/usr/src/sys/A-KO  i386

>Description:
Problems started to occur after updating from 5.2.1-p9 to 5.3-release.
Running GENERIC kernel renamed to A-KO and with devicepolling included.
In addition debuging options vas enabled to produce the debuging trace.(at least i hoped for it.)

The computer is an low cost fileserver serving several Linux clients over NFS. The server usually runs unattended 24/7 with the harddrives in IDLE or SUSPEND mode to reduce heat and noise since the files are only accessed occasionally.
I use sysutils/ataidle to set the IDLE time to 5 minutes and the SUSPEND time to 10 minutes. before 5.3-RELEASE the only effect was occasional timeout warnings on local console (se dmesg) when the nfs mounted partitions were idle and an access attempts were made locally or over nfs. This incured no crashes, only an delay until the harddrives had time to spin up and answer the read/write request.

But now the nfs client computer running linux gentoo and nfsutils 1.0.6-r4 will sometimes throw an message that one of several of the remote NFS mounts are "to big to be calculated" when running 'df -h'.
After this any read/write or mount/umount attempts on the Freebsd server on the particular harddrives causes an fatal kernel panic.
unmounting the nfs mounts on the remote linux client also causes the kernel panic on the fbsd box, but as long as _no_ I/O attempts are made towards the affected mounts everything appears to be fine.
It is _not_ clear if this is an panic caused by NFS bugs, kernel bugs or both. I have not been able to reproduce the crashes when _not_ exporting the drives as nfs mounts. All NFS mounts are mounted with these options on the client :(rw,soft,intr,bg,nfsvers=3,retry=20,addr=192.168.1.10 )

I suspect that there might be an connection to an observed change in behavior from 5.2.1 to 5.3-release, because occasionally on 5.3-R an read/write request on an idle/suspended harddrive vil cause an input/output error to be printed on the local console. An wild guess would be that the NFS daemon or an process attached to it can't handle the input/output error message and causes an buffer overflow or integer to wrap around into zero or negative values. This again leads to an panic.

Even an kernel running with debuging enabled and witness enabled will only produce this short panic message:

PANIC message:
------------
panic: vrele: negative ref cnt
uptime: (from a couple of minutes to a few days.)

DMESG:

PS. The READ_DMA timeouts and failures are normal(?) noncritical behaviour on f-bsd prior to 5.3-R that occur when attempting to read from an idling or suspended harddrive and can be ignored. They are only included to show prior behavior. These were present in erlier versions of fbsd that didn't panic and are still present.

---------
Copyright (c) 1992-2004 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 5.3-RELEASE #3: Tue Nov  9 11:07:05 CET 2004
    root at a-ko:/usr/obj/usr/src/sys/A-KO
WARNING: WITNESS option enabled, expect reduced performance.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: VIA C3 Samuel 2 (601.37-MHz 686-class CPU)
  Origin = "CentaurHauls"  Id = 0x673  Stepping = 3
  Features=0x803035<FPU,DE,TSC,MSR,MTRR,PGE,MMX>
real memory  = 251592704 (239 MB)
avail memory = 236539904 (225 MB)
npx0: [FAST]
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <VT9174 AWRDACPI> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
cpu0: <ACPI CPU (3 Cx states)> on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
agp0: <VIA Generic host to PCI bridge> mem 0xe6000000-0xe6ffffff at device 0.0 on pci0
pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pci1: <display, VGA> at device 0.0 (no driver attached)
vr0: <VIA VT6105 Rhine III 10/100BaseTX> port 0xc000-0xc0ff mem 0xe8005000-0xe80050ff irq 12 at device 15.0 on pci0
miibus0: <MII bus> on vr0
ukphy0: <Generic IEEE 802.3u media interface> on miibus0
ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
vr0: Ethernet address: 00:40:63:c9:da:14
uhci0: <VIA 83C572 USB controller> port 0xc400-0xc41f irq 10 at device 16.0 on pci0
uhci0: [GIANT-LOCKED]
usb0: <VIA 83C572 USB controller> on uhci0
usb0: USB revision 1.0
uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <VIA 83C572 USB controller> port 0xc800-0xc81f irq 11 at device 16.1 on pci0
uhci1: [GIANT-LOCKED]
usb1: <VIA 83C572 USB controller> on uhci1
usb1: USB revision 1.0
uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: <VIA 83C572 USB controller> port 0xcc00-0xcc1f irq 5 at device 16.2 on pci0
uhci2: [GIANT-LOCKED]
usb2: <VIA 83C572 USB controller> on uhci2
usb2: USB revision 1.0
uhub2: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
pci0: <serial bus, USB> at device 16.3 (no driver attached)
isab0: <PCI-ISA bridge> at device 17.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <VIA 8235 UDMA133 controller> port 0xd000-0xd00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 17.1 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
pcm0: <VIA VT8235> port 0xd400-0xd4ff irq 5 at device 17.5 on pci0
pcm0: [GIANT-LOCKED]
pcm0: <VIA Technologies VIA1612A AC97 Codec>
vr1: <VIA VT6102 Rhine II 10/100BaseTX> port 0xd800-0xd8ff mem 0xe8004000-0xe80040ff irq 10 at device 18.0 on pci0
miibus1: <MII bus> on vr1
ukphy1: <Generic IEEE 802.3u media interface> on miibus1
ukphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
vr1: Ethernet address: 00:40:63:c9:da:13
atapci1: <Promise PDC20268 UDMA100 controller> port 0xec00-0xec0f,0xe800-0xe803,0xe400-0xe407,0xe000-0xe003,0xdc00-0xdc07 mem 0xe8000000-0xe8003fff i
rq 11 at device 20.0 on pci0
ata2: channel #0 on atapci1
ata3: channel #1 on atapci1
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <VIA 83C572 USB controller> port 0xc800-0xc81f irq 11 at device 16.1 on pci0
uhci1: [GIANT-LOCKED]
usb1: <VIA 83C572 USB controller> on uhci1
usb1: USB revision 1.0
uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: <VIA 83C572 USB controller> port 0xcc00-0xcc1f irq 5 at device 16.2 on pci0
uhci2: [GIANT-LOCKED]
usb2: <VIA 83C572 USB controller> on uhci2
usb2: USB revision 1.0
uhub2: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
pci0: <serial bus, USB> at device 16.3 (no driver attached)
isab0: <PCI-ISA bridge> at device 17.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <VIA 8235 UDMA133 controller> port 0xd000-0xd00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 17.1 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
pcm0: <VIA VT8235> port 0xd400-0xd4ff irq 5 at device 17.5 on pci0
pcm0: [GIANT-LOCKED]
pcm0: <VIA Technologies VIA1612A AC97 Codec>
vr1: <VIA VT6102 Rhine II 10/100BaseTX> port 0xd800-0xd8ff mem 0xe8004000-0xe80040ff irq 10 at device 18.0 on pci0
miibus1: <MII bus> on vr1
ukphy1: <Generic IEEE 802.3u media interface> on miibus1
ukphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
vr1: Ethernet address: 00:40:63:c9:da:13
atapci1: <Promise PDC20268 UDMA100 controller> port 0xec00-0xec0f,0xe800-0xe803,0xe400-0xe407,0xe000-0xe003,0xdc00-0xdc07 mem 0xe8000000-0xe8003fff i
rq 11 at device 20.0 on pci0
ata2: channel #0 on atapci1
ata3: channel #1 on atapci1
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
ppc0: <Standard parallel printer port> port 0x378-0x37f irq 7 on acpi0
ppc0: Generic chipset (EPP/NIBBLE) in COMPATIBLE mode
ppbus0: <Parallel port bus> on ppc0
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
pmtimer0 on isa0
orm0: <ISA Option ROMs> at iomem 0xd8000-0xda7ff,0xc0000-0xcdfff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
Timecounter "TSC" frequency 601367159 Hz quality 800
Timecounters tick every 10.000 msec
witness_get: witness exhausted
ipfw2 initialized, divert disabled, rule-based forwarding disabled, default to deny, logging disabled
acpi_cpu: throttling enabled, 2 steps (100% to 50.0%), currently 100.0%
ad0: 38204MB <SAMSUNG SP0411N/TW100-11> [77622/16/63] at ata0-master UDMA100
ad1: 38204MB <SAMSUNG SP0411N/TW100-11> [77622/16/63] at ata0-slave UDMA100
ad2: 190782MB <ST3200822A/3.01> [387621/16/63] at ata1-master UDMA100
ad3: 190782MB <ST3200822A/3.01> [387621/16/63] at ata1-slave UDMA100
ad4: 114473MB <ST3120026A/3.06> [232581/16/63] at ata2-master UDMA100
ad5: 114473MB <ST3120026A/3.06> [232581/16/63] at ata2-slave UDMA100
ad6: 190782MB <ST3200822A/3.01> [387621/16/63] at ata3-master UDMA100
ad7: 190782MB <ST3200822A/3.01> [387621/16/63] at ata3-slave UDMA100
Mounting root from ufs:/dev/ad0s1a
ad2: TIMEOUT - READ_DMA retrying (2 retries left) LBA=12127
ad2: FAILURE - READ_DMA timed out
ad3: TIMEOUT - READ_DMA retrying (2 retries left) LBA=191
ad3: FAILURE - READ_DMA timed out

>How-To-Repeat:
0. Boot GENERIC kernel and run nfsd on 5.3-R#3
1. run sysutils/ataidle to set timeout options on harddives
2. mount local partitions that will be NFS exported
3. mount exported NFS partitions on an nfs linux client
4. wait until the drives has entered idle or suspend 
5. run 'df -h' or 'ls' or 'mv' or any other read/write operation that exceedes the content of the clients NFS cache. Occasionally one of these operations will throw an error message or input/output error if the drive is idle/suspended. this happens fairly often, but not always.
6. If warning message occurs, any futher read/write attempts both locally or remotely on the affected drive will cause an kernel panic and freeze on the Freebsd box.


>Fix:
Unknown. 

Possible workaround: 
Nott really, but see my _uneducated_ guess on the description.
>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list