ZFS + nullfs + Linuxulator = panic?

Tue Feb 21 16:58:54 UTC 2012

On Feb 17, 2012, at 8:27 PM, Konstantin Belousov wrote:

> On Thu, Feb 16, 2012 at 12:07:46PM -0500, Paul Mather wrote:
>> On Feb 16, 2012, at 10:49 AM, Konstantin Belousov wrote:
>> 
>>> On Thu, Feb 16, 2012 at 10:09:27AM -0500, Paul Mather wrote:
>>>> On Feb 14, 2012, at 7:47 PM, Konstantin Belousov wrote:
>>>> 
>>>>> On Tue, Feb 14, 2012 at 09:38:18AM -0500, Paul Mather wrote:
>>>>>> I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel, last built 2012-02-08).  It will panic during the daily periodic scripts that run at 3am.  Here is the most recent panic message:
>>>>>> 
>>>>>> Fatal trap 9: general protection fault while in kernel mode
>>>>>> cpuid = 0; apic id = 00
>>>>>> instruction pointer     = 0x20:0xffffffff8069d266
>>>>>> stack pointer           = 0x28:0xffffff8094b90390
>>>>>> frame pointer           = 0x28:0xffffff8094b903a0
>>>>>> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>>>>>                      = DPL 0, pres 1, long 1, def32 0, gran 1
>>>>>> processor eflags        = resume, IOPL = 0
>>>>>> current process         = 72566 (ps)
>>>>>> trap number             = 9
>>>>>> panic: general protection fault
>>>>>> cpuid = 0
>>>>>> KDB: stack backtrace:
>>>>>> #0 0xffffffff8062cf8e at kdb_backtrace+0x5e
>>>>>> #1 0xffffffff805facd3 at panic+0x183
>>>>>> #2 0xffffffff808e6c20 at trap_fatal+0x290
>>>>>> #3 0xffffffff808e715a at trap+0x10a
>>>>>> #4 0xffffffff808cec64 at calltrap+0x8
>>>>>> #5 0xffffffff805ee034 at fill_kinfo_thread+0x54
>>>>>> #6 0xffffffff805eee76 at fill_kinfo_proc+0x586
>>>>>> #7 0xffffffff805f22b8 at sysctl_out_proc+0x48
>>>>>> #8 0xffffffff805f26c8 at sysctl_kern_proc+0x278
>>>>>> #9 0xffffffff8060473f at sysctl_root+0x14f
>>>>>> #10 0xffffffff80604a2a at userland_sysctl+0x14a
>>>>>> #11 0xffffffff80604f1a at __sysctl+0xaa
>>>>>> #12 0xffffffff808e62d4 at amd64_syscall+0x1f4
>>>>>> #13 0xffffffff808cef5c at Xfast_syscall+0xfc
>>>>> 
>>>>> Please look up the line number for the fill_kinfo_thread+0x54.
>>>> 
>>>> 
>>>> Is there a way for me to do this from the above information? As
>>>> I said in the original message, I failed to get a crash dump after
>>>> reboot (because, it turns out, I hadn't set up my gmirror swap device
>>>> properly). Alas, with the latest panic, it appears to have hung[1]
>>>> during the "Dumping" phase, so it looks like I won't get a saved crash
>>>> dump this time, either. :-(
>>> 
>>> Load the kernel.debug into kgdb, and from there do
>>> "list *fill_kinfo_thread+0x54".
>> 
>> 
>> gromit# kgdb /usr/obj/usr/src/sys/GENERIC/kernel.debug
>> GNU gdb 6.1.1 [FreeBSD]
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and you are
>> welcome to change it and/or distribute copies of it under certain conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>> This GDB was configured as "amd64-marcel-freebsd"...
>> (kgdb) list *fill_kinfo_thread+0x54
>> 0xffffffff805ee034 is in fill_kinfo_thread (/usr/src/sys/kern/kern_proc.c:854).
>> 849             thread_lock(td);
>> 850             if (td->td_wmesg != NULL)
>> 851                     strlcpy(kp->ki_wmesg, td->td_wmesg, sizeof(kp->ki_wmesg));
>> 852             else
>> 853                     bzero(kp->ki_wmesg, sizeof(kp->ki_wmesg));
>> 854             strlcpy(kp->ki_ocomm, td->td_name, sizeof(kp->ki_ocomm));
>> 855             if (TD_ON_LOCK(td)) {
>> 856                     kp->ki_kiflag |= KI_LOCKBLOCK;
>> 857                     strlcpy(kp->ki_lockname, td->td_lockname,
>> 858                         sizeof(kp->ki_lockname));
>> (kgdb) 
> 
> This is indeed strange. It can only occur if td pointer is damaged.
> 
> Please, try to get a core and at least print the content of *td in this case.

Alas, I was unable to obtain a crash dump (or even a panic) last night, but I have learned more about the circumstances that lead to this panic.

In attempting to find a workaround for this panic (because having nightly backups instead of panics is a good thing:) I discovered two successful approaches.  In the original situation that causes a reliable panic I have a daemonised Tivoli "dsmc schedule" job running.  This communicates with the Tivoli TSM server to determine when it should run its scheduled backup.  When the backup is run, there is defined  in a Tivoli client config file (in /compat/linux/opt/tivoli/tsm/client/ba/bin/dsm.sys) a preschedulecmd and a postschedulecmd, which are /usr/local/bin/make_zfs_backup_snapshot and /usr/local/bin/remove_zfs_backup_snapshot respectively.  The preschedulecmd script (run prior to the actual backup) basically makes ZFS snapshots of all filesets and nullfs-mounts them under /backup.  It then creates /compat/linux/etc/mtab to list these nullfs filesystems as ext2 file systems, so that the Tivoli backup client can know about them to back them up.  The postschedulecmd (run after the actual backup) unmounts all the nullfs-mounted filesystems corresponding to the ZFS backup snapshots and then destroys all the backup snapshots.

The first workaround that doesn't lead to a panic is this: Do not run "dsmc schedule".  Instead, via cron, run this simple script:

#!/bin/sh
/usr/local/bin/make_zfs_backup_snapshot
/compat/linux/opt/tivoli/tsm/client/ba/bin/dsmc incremental
/usr/local/bin/remove_zfs_backup_snapshot

(Note: the pre- and post- scripts are being run outside of dsmc.)  The script is run at 2 am nightly (around the time that "dsmc schedule" usually ends up performing the scheduled backup) and completes before the regular 3 am "periodic daily" job that normally leads to a panic.  This workaround avoids a panic.

The second workaround that doesn't lead to a panic is this: In the /usr/local/bin/make_zfs_backup_snapshot and /usr/local/bin/remove_zfs_backup_snapshot scripts, replace "#!/bin/sh" with "#!/rescue/sh" to force the scripts to run the FreeBSD-branded sh rather than the Linux-branded /compat/linux/bin/sh:

gromit# brandelf /rescue/sh
File '/rescue/sh' is of brand 'FreeBSD' (9).
gromit# brandelf /compat/linux/bin/sh
File '/compat/linux/bin/sh' is of brand 'Linux' (3).

(Because the script is run by the Linux "dsmc" binary I am presuming it will run /compat/linux/bin/sh when it reads "#!/bin/sh" at the start of the pre/postschedulecmd scripts, because the "/bin/sh" that is installed as part of linux_base-f10-10_4 shadows the FreeBSD /bin/sh.)

So, it seems that the circumstances that lead to a reliable panic are running the daemonised Linux "dsmc schedule" that invokes scripts that run under the Linux "/bin/sh" (/compat/linux/bin/sh).

I'll revert the pre/postschedulecmd script "#!" paths back to /bin/sh to try and obtain a crash dump, but hopefully the above workarounds I found and described above might offer some insight as to where the cause of this panic lies.

Cheers,

Paul.