ZFS + nullfs + Linuxulator = panic?

Mon Feb 20 21:14:14 UTC 2012

On Feb 17, 2012, at 8:27 PM, Konstantin Belousov wrote:

> On Thu, Feb 16, 2012 at 12:07:46PM -0500, Paul Mather wrote:
>> On Feb 16, 2012, at 10:49 AM, Konstantin Belousov wrote:
>> 
>>> On Thu, Feb 16, 2012 at 10:09:27AM -0500, Paul Mather wrote:
>>>> On Feb 14, 2012, at 7:47 PM, Konstantin Belousov wrote:
>>>> 
>>>>> On Tue, Feb 14, 2012 at 09:38:18AM -0500, Paul Mather wrote:
>>>>>> I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel, last built 2012-02-08).  It will panic during the daily periodic scripts that run at 3am.  Here is the most recent panic message:
>>>>>> 
>>>>>> Fatal trap 9: general protection fault while in kernel mode
>>>>>> cpuid = 0; apic id = 00
>>>>>> instruction pointer     = 0x20:0xffffffff8069d266
>>>>>> stack pointer           = 0x28:0xffffff8094b90390
>>>>>> frame pointer           = 0x28:0xffffff8094b903a0
>>>>>> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>>>>>                      = DPL 0, pres 1, long 1, def32 0, gran 1
>>>>>> processor eflags        = resume, IOPL = 0
>>>>>> current process         = 72566 (ps)
>>>>>> trap number             = 9
>>>>>> panic: general protection fault
>>>>>> cpuid = 0
>>>>>> KDB: stack backtrace:
>>>>>> #0 0xffffffff8062cf8e at kdb_backtrace+0x5e
>>>>>> #1 0xffffffff805facd3 at panic+0x183
>>>>>> #2 0xffffffff808e6c20 at trap_fatal+0x290
>>>>>> #3 0xffffffff808e715a at trap+0x10a
>>>>>> #4 0xffffffff808cec64 at calltrap+0x8
>>>>>> #5 0xffffffff805ee034 at fill_kinfo_thread+0x54
>>>>>> #6 0xffffffff805eee76 at fill_kinfo_proc+0x586
>>>>>> #7 0xffffffff805f22b8 at sysctl_out_proc+0x48
>>>>>> #8 0xffffffff805f26c8 at sysctl_kern_proc+0x278
>>>>>> #9 0xffffffff8060473f at sysctl_root+0x14f
>>>>>> #10 0xffffffff80604a2a at userland_sysctl+0x14a
>>>>>> #11 0xffffffff80604f1a at __sysctl+0xaa
>>>>>> #12 0xffffffff808e62d4 at amd64_syscall+0x1f4
>>>>>> #13 0xffffffff808cef5c at Xfast_syscall+0xfc
>>>>> 
>>>>> Please look up the line number for the fill_kinfo_thread+0x54.
>>>> 
>>>> 
>>>> Is there a way for me to do this from the above information? As
>>>> I said in the original message, I failed to get a crash dump after
>>>> reboot (because, it turns out, I hadn't set up my gmirror swap device
>>>> properly). Alas, with the latest panic, it appears to have hung[1]
>>>> during the "Dumping" phase, so it looks like I won't get a saved crash
>>>> dump this time, either. :-(
>>> 
>>> Load the kernel.debug into kgdb, and from there do
>>> "list *fill_kinfo_thread+0x54".
>> 
>> 
>> gromit# kgdb /usr/obj/usr/src/sys/GENERIC/kernel.debug
>> GNU gdb 6.1.1 [FreeBSD]
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and you are
>> welcome to change it and/or distribute copies of it under certain conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>> This GDB was configured as "amd64-marcel-freebsd"...
>> (kgdb) list *fill_kinfo_thread+0x54
>> 0xffffffff805ee034 is in fill_kinfo_thread (/usr/src/sys/kern/kern_proc.c:854).
>> 849             thread_lock(td);
>> 850             if (td->td_wmesg != NULL)
>> 851                     strlcpy(kp->ki_wmesg, td->td_wmesg, sizeof(kp->ki_wmesg));
>> 852             else
>> 853                     bzero(kp->ki_wmesg, sizeof(kp->ki_wmesg));
>> 854             strlcpy(kp->ki_ocomm, td->td_name, sizeof(kp->ki_ocomm));
>> 855             if (TD_ON_LOCK(td)) {
>> 856                     kp->ki_kiflag |= KI_LOCKBLOCK;
>> 857                     strlcpy(kp->ki_lockname, td->td_lockname,
>> 858                         sizeof(kp->ki_lockname));
>> (kgdb) 
> 
> This is indeed strange. It can only occur if td pointer is damaged.
> 
> Please, try to get a core and at least print the content of *td in this case.

Hopefully, I will be able to get a crash dump tonight.  I disabled the Tivoli "dsmc schedule" job over the weekend, because I don't have ready physical access to the machine and prefer it not to be down for very extended periods of time.  (As I reported previously, for some reason the system seems to get stuck saving the crash dump.  If this persists, maybe it might be better to get the system to drop into the debugger on panic instead of hoping forlornly for a successful crash dump.)

Cheers,

Paul.