Fatal trap 12/TIMEOUT - READ_DMA (was Re: Stuck in geli)

Wayne Sierke ws at au.dyndns.ws
Wed Aug 6 09:01:08 UTC 2008


On Tue, 2008-08-05 at 20:30 -0700, Jeremy Chadwick wrote:
> This looks like the issue I've been tracking for months now.  I'm sorry
> the document isn't complete; it's an issue of time...
> 
> http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting


> My experiences with disk timeouts on FreeBSD is that the OS does not
> handle it well at all, regardless of geli(4) being used or not.  The
> entire system can deadlock, and in some cases panic (which for me is
> the more common result).
> 
Recently I returned to my desktop system to find it had rebooted itself and found the following:

        # kgdb /boot/kernel/kernel /var/crash/vmcore.3
        [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
        GNU gdb 6.1.1 [FreeBSD]
        Copyright 2004 Free Software Foundation, Inc.
        GDB is free software, covered by the GNU General Public License, and you are
        welcome to change it and/or distribute copies of it under certain conditions.
        Type "show copying" to see the conditions.
        There is absolutely no warranty for GDB.  Type "show warranty" for details.
        This GDB was configured as "i386-marcel-freebsd".
        There is no member named pathname.
        Error while mapping shared library sections:
        rtc.ko: No such file or directory.
        Reading symbols from /boot/kernel/vesa.ko...Reading symbols from /boot/kernel/vesa.ko.symbols...done.
        done.
        Loaded symbols for /boot/kernel/vesa.ko
        Reading symbols from /boot/kernel/linux.ko...Reading symbols from /boot/kernel/linux.ko.symbols...done.
        done.
        Loaded symbols for /boot/kernel/linux.ko
        Reading symbols from /boot/kernel/snd_ich.ko...Reading symbols from /boot/kernel/snd_ich.ko.symbols...done.
        done.
        Loaded symbols for /boot/kernel/snd_ich.ko
        Reading symbols from /boot/kernel/sound.ko...Reading symbols from /boot/kernel/sound.ko.symbols...done.
        done.
        Loaded symbols for /boot/kernel/sound.ko
        Reading symbols from /boot/modules/nvidia.ko...done.
        Loaded symbols for /boot/modules/nvidia.ko
        Reading symbols from /boot/kernel/acpi.ko...Reading symbols from /boot/kernel/acpi.ko.symbols...done.
        done.
        Loaded symbols for /boot/kernel/acpi.ko
        Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
        done.
        Loaded symbols for /boot/kernel/linprocfs.ko
        Reading symbols from /boot/kernel/green_saver.ko...Reading symbols from /boot/kernel/green_saver.ko.symbols...done.
        done.
        Loaded symbols for /boot/kernel/green_saver.ko
        Error while reading shared library symbols:
        rtc.ko: No such file or directory.
        Unread portion of the kernel message buffer:
        ad1: TIMEOUT - READ_DMA retrying (1 retry left) LBA=67332091
        
        
        Fatal trap 12: page fault while in kernel mode
        cpuid = 0; apic id = 00
        fault virtual address   = 0x188
        fault code              = supervisor read, page not present
        instruction pointer     = 0x20:0xc075ce24
        stack pointer           = 0x28:0xe52f1c04
        frame pointer           = 0x28:0xe52f1c1c
        code segment            = base 0x0, limit 0xfffff, type 0x1b
                                = DPL 0, pres 1, def32 1, gran 1
        processor eflags        = interrupt enabled, resume, IOPL = 0
        current process         = 18 (swi6: task queue)
        trap number             = 12
        panic: page fault
        cpuid = 0
        Uptime: 1d11h41m37s
        Physical memory: 1519 MB
        Dumping 214 MB: 199 183 167 151 135 119 103 87 71 55 39 23 7
        
        #0  doadump () at pcpu.h:195
        195             __asm __volatile("movl %%fs:0,%0" : "=r" (td));
        (kgdb) bt
        #0  doadump () at pcpu.h:195
        #1  0xc076a137 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
        #2  0xc076a3f9 in panic (fmt=Variable "fmt" is not available.
        ) at /usr/src/sys/kern/kern_shutdown.c:572
        #3  0xc0a71aec in trap_fatal (frame=0xe52f1bc4, eva=392) at /usr/src/sys/i386/i386/trap.c:899
        #4  0xc0a71d70 in trap_pfault (frame=0xe52f1bc4, usermode=0, eva=392) at /usr/src/sys/i386/i386/trap.c:812
        #5  0xc0a7271c in trap (frame=0xe52f1bc4) at /usr/src/sys/i386/i386/trap.c:490
        #6  0xc0a584ab in calltrap () at /usr/src/sys/i386/i386/exception.s:139
        #7  0xc075ce24 in _mtx_lock_sleep (m=0xc5abedcc, tid=3302165152, opts=0, file=0x0, line=0)
            at /usr/src/sys/kern/kern_mutex.c:335
        #8  0xc07693b6 in _sema_post (sema=0xc5abedcc, file=0x0, line=0) at /usr/src/sys/kern/kern_sema.c:79
        #9  0xc050bd30 in ata_completed (context=0xc5abed80, dummy=1) at /usr/src/sys/dev/ata/ata-queue.c:481
        #10 0xc079ce85 in taskqueue_run (queue=0xc4d21680) at /usr/src/sys/kern/subr_taskqueue.c:255
        #11 0xc079d193 in taskqueue_swi_run (dummy=0x0) at /usr/src/sys/kern/subr_taskqueue.c:297
        #12 0xc074acfb in ithread_loop (arg=0xc4d2a940) at /usr/src/sys/kern/kern_intr.c:1036
        #13 0xc0747ad9 in fork_exit (callout=0xc074ab50 <ithread_loop>, arg=0xc4d2a940, frame=0xe52f1d38)
            at /usr/src/sys/kern/kern_fork.c:783
        #14 0xc0a58520 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:205
        (kgdb)

This is the first time I've examined one of these DMA TIMEOUT events.
I've probably seen a handful of these over the last few years, perhaps 2
to 4 per year. Unfortunately this system also occasionally faults from
X-related hangs - or what I otherwise assume to be X-related. In any
case I don't often get cores left behind that I've noticed.

        # atacontrol info ata0
        Master:  ad0 <ST380024A/3.33> ATA/ATAPI revision 6
        Slave:   ad1 <ST380011A/8.01> ATA/ATAPI revision 6
        

>From another system I have here (a VIA EPIA/6.3-PRERELEASE) I can only
find one TIMEOUT instance from the last 3 years, but with no discernible
consequences. In fact that system has been rock-solid. It's a mail
server, runs courier-imap, apache-1.3, samba, mysql and assorted other
stuff, but is not heavily loaded.

        Jan  2 03:30:20 lillith-iv kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=3196031
        
        # atacontrol info ata0
        Master:  ad0 <WDC WD800JB-00ETA0/77.07W77> ATA/ATAPI revision 6
        Slave:   ad1 <WDC WD1200JB-00FUA0/15.05R15> ATA/ATAPI revision 6
        

Anyway, I don't know whether there's any significant or useful
information in that vmcore. Perhaps someone could let me know?


Wayne




More information about the freebsd-stable mailing list