misc/125617: ath(4) related panic
Rory Arms
rorya+freebsd.org at TrueStep.com
Mon Jul 14 23:10:03 UTC 2008
>Number: 125617
>Category: misc
>Synopsis: ath(4) related panic
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Mon Jul 14 23:10:02 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator: Rory Arms
>Release: 7.0-RELEASE
>Organization:
>Environment:
FreeBSD foo.domain.com 7.0-RELEASE FreeBSD 7.0-RELEASE #13: Sat Mar 8 19:01:13 EST 2008 root at foo.domain.com:/mnt/obj/usr/src/sys/TSERVER i386
>Description:
I noticed that fxp1 was producing a lot of errors. At first I noticed it because the NFS clients were dropping a lot of packets, and there were big delays in pinging the servers from the clients as well.
So, I looked at the console and saw several of these errors over and over.
fxp1: SCB timeout: 0x80 0x0 0x50 0x400
In my case, I have ath0 bridged with fxp1, to form one network. So the above errors were mixed in with
ath0: ath_reset: unable to reset hardware; hal status 03
This is the first time I've noticed this with this release, after over 60 days of uptime. I had been noticing that the wireless sometimes wasn't routing correctly through the NAT router (natd(8) + ipfw(4)), even though the fxp1 clients could, over that time, but it was an intermittent problem. I assume that issue was related to a bug in if_bridge(4), but that's just a guess. All I know is that issue started happening with 7.0.
So, the next thing I decided to do is to bring down the bridge0 interface, and see if that would alleviate the issue (again, thinking the ethernet problems I was seeing were exacerbated by being linked in bridge0 or a problem with ath0.
A few minutes after I downed the bridge0 interface, the kernel paniced.
I have minidumps turned on so on the next boot it was able to scavange the dump. Here's the backtrace, as seen via kgdb(1):
> sudo kgdb /boot/kernel/kernel vmcore.0
Password:
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".
Unread portion of the kernel message buffer:
ath0: device timeout
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0xc49a770c
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc04b569a
stack pointer = 0x28:0xe3ffebc4
frame pointer = 0x28:0xe3ffebf8
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 14 (swi4: clock sio)
trap number = 12
panic: page fault
cpuid = 1
Uptime: 48d12h18m43s
Physical memory: 1015 MB
Dumping 197 MB: 182 166 150 134 118 102 86 70 54 38 22 6
#0 doadump () at pcpu.h:195
195 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) bt
#0 doadump () at pcpu.h:195
#1 0xc059fbd6 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2 0xc059feae in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:563
#3 0xc08190cc in trap_fatal (frame=0xe3ffeb84, eva=3298457356)
at /usr/src/sys/i386/i386/trap.c:899
#4 0xc081933b in trap_pfault (frame=0xe3ffeb84, usermode=0, eva=3298457356)
at /usr/src/sys/i386/i386/trap.c:812
#5 0xc0819d32 in trap (frame=0xe3ffeb84) at /usr/src/sys/i386/i386/trap.c:490
#6 0xc080097b in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7 0xc04b569a in ath_rxbuf_init (sc=0xc3bdf000, bf=0xc3be9324)
at /usr/src/sys/dev/ath/if_ath.c:3284
#8 0xc04b5919 in ath_startrecv (sc=0xc3bdf000)
at /usr/src/sys/dev/ath/if_ath.c:4928
#9 0xc04bce7c in ath_reset (ifp=0xc3bc8800)
at /usr/src/sys/dev/ath/if_ath.c:1145
#10 0xc04bd3bb in ath_watchdog (ifp=0xc3bc8800)
at /usr/src/sys/dev/ath/if_ath.c:5774
#11 0xc0630871 in if_slowtimo (arg=0x0) at /usr/src/sys/net/if.c:1478
#12 0xc05b2136 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:274
#13 0xc058242b in ithread_loop (arg=0xc3b00230)
at /usr/src/sys/kern/kern_intr.c:1036
#14 0xc057f154 in fork_exit (callout=0xc0582260 <ithread_loop>,
arg=0xc3b00230, frame=0xe3ffed38) at /usr/src/sys/kern/kern_fork.c:781
---Type <return> to continue, or q <return> to quit---
#15 0xc08009f0 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:205
(kgdb) print panicstr
$1 = 0xc08f3e00 "page fault"
While the server was fscking everything, I disconnected the cable and rerouted it, since it was tangled with a lot of other cables.. so thinking this issue could have been the result of some cross-talk, I rerouted it. I restarted the server and fxp1 has been working normally now for about 5 hours, with not a single new SCB timeout error in the logs, since the restart.
As always here's the kernel configuration, sans the commented lines:
cpu I686_CPU
ident TSERVER-70
makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols
options SCHED_4BSD # 4BSD scheduler
options PREEMPTION # Enable kernel thread preemption
options INET # InterNETworking
options INET6 # IPv6 communications protocols
options SCTP # Stream Control Transmission Protocol
options FFS # Berkeley Fast Filesystem
options SOFTUPDATES # Enable FFS soft updates support
options UFS_ACL # Support for access control lists
options UFS_DIRHASH # Improve performance on big directories
options UFS_GJOURNAL # Enable gjournal-based UFS journaling
options NFSCLIENT # Network Filesystem Client
options NFSSERVER # Network Filesystem Server
options MSDOSFS # MSDOS Filesystem
options CD9660 # ISO 9660 Filesystem
options PROCFS # Process filesystem (requires PSEUDOFS)
options PSEUDOFS # Pseudo-filesystem framework
options GEOM_PART_GPT # GUID Partition Tables.
options GEOM_LABEL # Provides labelization
options COMPAT_43TTY # BSD 4.3 TTY compat [KEEP THIS!]
options COMPAT_FREEBSD6 # Compatible with FreeBSD6
options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI
options KTRACE # ktrace(1) support
options SYSVSHM # SYSV-style shared memory
options SYSVMSG # SYSV-style message queues
options SYSVSEM # SYSV-style semaphores
options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions
options KBD_INSTALL_CDEV # install a CDEV entry in /dev
options ADAPTIVE_GIANT # Giant mutex is adaptive.
options STOP_NMI # Stop CPUS using NMI instead of IPI
options AUDIT # Security event auditing
options SMP # Symmetric MultiProcessor Kernel
device apic # I/O APIC
device cpufreq
options IPDIVERT # divert(4)
options IPFIREWALL #firewall
options IPFIREWALL_VERBOSE #enable logging to syslogd(8)
options IPFIREWALL_DEFAULT_TO_ACCEPT #allow everything by default
options DUMMYNET # dummynet(4)
device pci
device fdc
device ata
device atadisk # ATA disk drives
device ataraid # ATA RAID drives
device atapicd # ATAPI CDROM drives
options ATA_STATIC_ID # Static device numbering
device ahc # AHA2940 and onboard AIC7xxx devices
options AHC_REG_PRETTY_PRINT # Print register bitfields in debug
# output. Adds ~128k to driver.
# output. Adds ~215k to driver.
device scbus # SCSI bus (required for SCSI)
device ch # SCSI media changers
device da # Direct Access (disks)
device sa # Sequential Access (tape etc)
device cd # CD
device pass # Passthrough device (direct SCSI access)
device ses # SCSI Environmental Services (and SAF-TE)
device atkbdc # AT keyboard controller
device atkbd # AT keyboard
device psm # PS/2 mouse
device kbdmux # keyboard multiplexer
device vga # VGA video card driver
device splash # Splash screen and screen saver support
device sc
device agp # support several AGP chipsets
device pmtimer
device cbb # cardbus (yenta) bridge
device pccard # PC Card (16-bit) bus
device cardbus # CardBus (32-bit) bus
device sio # 8250, 16[45]50 based serial ports
device uart # Generic UART driver
device ppc
device ppbus # Parallel port bus (required)
device lpt # Printer
device plip # TCP/IP over parallel
device ppi # Parallel port interface device
device miibus # MII bus support
device fxp # Intel EtherExpress PRO/100B (82557, 82558)
device wlan # 802.11 support
device wlan_wep # 802.11 WEP support
device wlan_ccmp # 802.11 CCMP support
device wlan_tkip # 802.11 TKIP support
device wlan_amrr # AMRR transmit rate control algorithm
device wlan_scan_ap # 802.11 AP mode scanning
device wlan_scan_sta # 802.11 STA mode scanning
device ath # Atheros pci/cardbus NIC's
device ath_hal # Atheros HAL (Hardware Access Layer)
device ath_rate_sample # SampleRate tx rate control for ath
device loop # Network loopback
device random # Entropy device
device ether # Ethernet support
device sl # Kernel SLIP
device ppp # Kernel PPP
device tun # Packet tunnel.
device pty # Pseudo-ttys (telnet etc)
device md # Memory "disks"
device gif # IPv6 and IPv4 tunneling
device faith # IPv6-to-IPv4 relaying (translation)
device firmware # firmware assist module
device bpf # Berkeley packet filter
device uhci # UHCI PCI->USB interface
device ohci # OHCI PCI->USB interface
device usb # USB Bus (required)
device ugen # Generic
device uhid # "Human Interface Devices"
device ukbd # Keyboard
device ulpt # Printer
device umass # Disks/Mass storage - Requires scbus and da
device ums # Mouse
device ural # Ralink Technology RT2500USB wireless NICs
device rum # Ralink Technology RT2501USB wireless NICs
>How-To-Repeat:
Unsure, unless it's something that can always be reproduced by downing the bridge0 interface, which has two members, fxp1 and ath0. Looking at the traceback the panic seemed to have been caused by ath(4), so I'm not sure that the bridge is at fault here, but maybe some kind of unhandled scenario by ath(4).
>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list