misc/125617: ath(4) related panic

Rory Arms rorya+freebsd.org at TrueStep.com
Mon Jul 14 23:10:03 UTC 2008


>Number:         125617
>Category:       misc
>Synopsis:       ath(4) related panic
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jul 14 23:10:02 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator:     Rory Arms
>Release:        7.0-RELEASE
>Organization:
>Environment:
FreeBSD foo.domain.com 7.0-RELEASE FreeBSD 7.0-RELEASE #13: Sat Mar  8 19:01:13 EST 2008     root at foo.domain.com:/mnt/obj/usr/src/sys/TSERVER  i386


>Description:
I noticed that fxp1 was producing a lot of errors. At first I noticed it because the NFS clients were dropping a lot of packets, and there were big delays in pinging the servers from the clients as well.

So, I looked at the console and saw several of these errors over and over.

fxp1: SCB timeout: 0x80 0x0 0x50 0x400

In my case, I have ath0 bridged with fxp1, to form one network. So the above errors were mixed in with 

ath0: ath_reset: unable to reset hardware; hal status 03

This is the first time I've noticed this with this release, after over 60 days of uptime. I had been noticing that the wireless sometimes wasn't routing correctly through the NAT router (natd(8) + ipfw(4)), even though the fxp1 clients could, over that time, but it was an intermittent problem. I assume that issue was related to a bug in if_bridge(4), but that's just a guess. All I know is that issue started happening with 7.0.

So, the next thing I decided to do is to bring down the bridge0 interface, and see if that would alleviate the issue (again, thinking the ethernet problems I was seeing were exacerbated by being linked in bridge0 or a problem with ath0.

A few minutes after I downed the bridge0 interface, the kernel paniced.

I have minidumps turned on so on the next boot it was able to scavange the dump. Here's the backtrace, as seen via kgdb(1):

> sudo kgdb /boot/kernel/kernel vmcore.0
Password:
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:
ath0: device timeout


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0xc49a770c
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc04b569a
stack pointer           = 0x28:0xe3ffebc4
frame pointer           = 0x28:0xe3ffebf8
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 14 (swi4: clock sio)
trap number             = 12
panic: page fault
cpuid = 1
Uptime: 48d12h18m43s
Physical memory: 1015 MB
Dumping 197 MB: 182 166 150 134 118 102 86 70 54 38 22 6

#0  doadump () at pcpu.h:195
195     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0xc059fbd6 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2  0xc059feae in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:563
#3  0xc08190cc in trap_fatal (frame=0xe3ffeb84, eva=3298457356)
    at /usr/src/sys/i386/i386/trap.c:899
#4  0xc081933b in trap_pfault (frame=0xe3ffeb84, usermode=0, eva=3298457356)
    at /usr/src/sys/i386/i386/trap.c:812
#5  0xc0819d32 in trap (frame=0xe3ffeb84) at /usr/src/sys/i386/i386/trap.c:490
#6  0xc080097b in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0xc04b569a in ath_rxbuf_init (sc=0xc3bdf000, bf=0xc3be9324)
    at /usr/src/sys/dev/ath/if_ath.c:3284
#8  0xc04b5919 in ath_startrecv (sc=0xc3bdf000)
    at /usr/src/sys/dev/ath/if_ath.c:4928
#9  0xc04bce7c in ath_reset (ifp=0xc3bc8800)
    at /usr/src/sys/dev/ath/if_ath.c:1145
#10 0xc04bd3bb in ath_watchdog (ifp=0xc3bc8800)
    at /usr/src/sys/dev/ath/if_ath.c:5774
#11 0xc0630871 in if_slowtimo (arg=0x0) at /usr/src/sys/net/if.c:1478
#12 0xc05b2136 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:274
#13 0xc058242b in ithread_loop (arg=0xc3b00230)
    at /usr/src/sys/kern/kern_intr.c:1036
#14 0xc057f154 in fork_exit (callout=0xc0582260 <ithread_loop>, 
    arg=0xc3b00230, frame=0xe3ffed38) at /usr/src/sys/kern/kern_fork.c:781
---Type <return> to continue, or q <return> to quit---
#15 0xc08009f0 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:205
(kgdb) print panicstr
$1 = 0xc08f3e00 "page fault"

While the server was fscking everything, I disconnected the cable and rerouted it, since it was tangled with a lot of other cables.. so thinking this issue could have been the result of some cross-talk, I rerouted it. I restarted the server and fxp1 has been working normally now for about 5 hours, with not a single new SCB timeout error in the logs, since the restart.

As always here's the kernel configuration, sans the commented lines:

cpu             I686_CPU
ident           TSERVER-70


makeoptions     DEBUG=-g                # Build kernel with gdb(1) debug symbols

options         SCHED_4BSD              # 4BSD scheduler
options         PREEMPTION              # Enable kernel thread preemption
options         INET                    # InterNETworking
options         INET6                   # IPv6 communications protocols
options         SCTP                    # Stream Control Transmission Protocol
options         FFS                     # Berkeley Fast Filesystem
options         SOFTUPDATES             # Enable FFS soft updates support
options         UFS_ACL                 # Support for access control lists
options         UFS_DIRHASH             # Improve performance on big directories
options         UFS_GJOURNAL            # Enable gjournal-based UFS journaling
options         NFSCLIENT               # Network Filesystem Client
options         NFSSERVER               # Network Filesystem Server
options         MSDOSFS                 # MSDOS Filesystem
options         CD9660                  # ISO 9660 Filesystem
options         PROCFS                  # Process filesystem (requires PSEUDOFS)
options         PSEUDOFS                # Pseudo-filesystem framework
options         GEOM_PART_GPT           # GUID Partition Tables.
options         GEOM_LABEL              # Provides labelization
options         COMPAT_43TTY            # BSD 4.3 TTY compat [KEEP THIS!]
options         COMPAT_FREEBSD6         # Compatible with FreeBSD6
options         SCSI_DELAY=5000         # Delay (in ms) before probing SCSI
options         KTRACE                  # ktrace(1) support
options         SYSVSHM                 # SYSV-style shared memory
options         SYSVMSG                 # SYSV-style message queues
options         SYSVSEM                 # SYSV-style semaphores
options         _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions
options         KBD_INSTALL_CDEV        # install a CDEV entry in /dev
options         ADAPTIVE_GIANT          # Giant mutex is adaptive.
options         STOP_NMI                # Stop CPUS using NMI instead of IPI
options         AUDIT                   # Security event auditing

options         SMP                     # Symmetric MultiProcessor Kernel
device          apic                    # I/O APIC

device          cpufreq

options         IPDIVERT                # divert(4)

options         IPFIREWALL              #firewall
options         IPFIREWALL_VERBOSE      #enable logging to syslogd(8)
options         IPFIREWALL_DEFAULT_TO_ACCEPT    #allow everything by default

options         DUMMYNET                # dummynet(4)



device          pci

device          fdc

device          ata
device          atadisk         # ATA disk drives
device          ataraid         # ATA RAID drives
device          atapicd         # ATAPI CDROM drives
options         ATA_STATIC_ID   # Static device numbering

device          ahc             # AHA2940 and onboard AIC7xxx devices
options         AHC_REG_PRETTY_PRINT    # Print register bitfields in debug
                                        # output.  Adds ~128k to driver.
                                        # output.  Adds ~215k to driver.



device          scbus           # SCSI bus (required for SCSI)
device          ch              # SCSI media changers
device          da              # Direct Access (disks)
device          sa              # Sequential Access (tape etc)
device          cd              # CD
device          pass            # Passthrough device (direct SCSI access)
device          ses             # SCSI Environmental Services (and SAF-TE)



device          atkbdc          # AT keyboard controller
device          atkbd           # AT keyboard
device          psm             # PS/2 mouse

device          kbdmux          # keyboard multiplexer

device          vga             # VGA video card driver

device          splash          # Splash screen and screen saver support

device          sc

device          agp             # support several AGP chipsets

device          pmtimer

device          cbb             # cardbus (yenta) bridge
device          pccard          # PC Card (16-bit) bus
device          cardbus         # CardBus (32-bit) bus

device          sio             # 8250, 16[45]50 based serial ports
device          uart            # Generic UART driver

device          ppc
device          ppbus           # Parallel port bus (required)
device          lpt             # Printer
device          plip            # TCP/IP over parallel
device          ppi             # Parallel port interface device



device          miibus          # MII bus support
device          fxp             # Intel EtherExpress PRO/100B (82557, 82558)


device          wlan            # 802.11 support
device          wlan_wep        # 802.11 WEP support
device          wlan_ccmp       # 802.11 CCMP support
device          wlan_tkip       # 802.11 TKIP support
device          wlan_amrr       # AMRR transmit rate control algorithm
device          wlan_scan_ap    # 802.11 AP mode scanning
device          wlan_scan_sta   # 802.11 STA mode scanning
device          ath             # Atheros pci/cardbus NIC's
device          ath_hal         # Atheros HAL (Hardware Access Layer)
device          ath_rate_sample # SampleRate tx rate control for ath

device          loop            # Network loopback
device          random          # Entropy device
device          ether           # Ethernet support
device          sl              # Kernel SLIP
device          ppp             # Kernel PPP
device          tun             # Packet tunnel.
device          pty             # Pseudo-ttys (telnet etc)
device          md              # Memory "disks"
device          gif             # IPv6 and IPv4 tunneling
device          faith           # IPv6-to-IPv4 relaying (translation)
device          firmware        # firmware assist module

device          bpf             # Berkeley packet filter

device          uhci            # UHCI PCI->USB interface
device          ohci            # OHCI PCI->USB interface
device          usb             # USB Bus (required)
device          ugen            # Generic
device          uhid            # "Human Interface Devices"
device          ukbd            # Keyboard
device          ulpt            # Printer
device          umass           # Disks/Mass storage - Requires scbus and da
device          ums             # Mouse
device          ural            # Ralink Technology RT2500USB wireless NICs
device          rum             # Ralink Technology RT2501USB wireless NICs


>How-To-Repeat:
Unsure, unless it's something that can always be reproduced by downing the bridge0 interface, which has two members, fxp1 and ath0. Looking at the traceback the panic seemed to have been caused by ath(4), so I'm not sure that the bridge is at fault here, but maybe some kind of unhandled scenario by ath(4).
>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list