ZFS Kernel Panic on 10.0-RELEASE

Mon Jun 2 09:12:20 UTC 2014

----- Original Message ----- 
From: "Mike Carlson" <mike at bayphoto.com>

> On 5/30/2014 1:10 PM, Mike Carlson wrote:
> > On 5/30/2014 12:48 PM, Jordan Hubbard wrote:
> >> On May 30, 2014, at 12:04 PM, Mike Carlson <mike at bayphoto.com> wrote:
> >>
> >>> Over the weekend, we had upgraded one of our servers from 9.1-RELEASE to 10.0-RELEASE, and then the zpool was upgraded (from 
> >>> 28 to 5000)
> >>>
> >>> Tuesday afternoon, the server suddenly rebooted (kernel panic), and as soon as it tried to remount all of its ZFS volumes, 
> >>> it panic'd again.
> >> What’s the panic text?  That’s pretty crucial in figuring out whether this is recoverable (e.g. if it’s spacemap corruption 
> >> related, probably not).
> >>
> >> - Jordan
> >>
> >>
> >>
> > I had linked the pictures I took of the console, but here is my manual reproduction:
> >
> >    Fatal trap 12: page fault while in kernel mode
> >    cpuid = 7; apic id = 07
> >    fault virtual address    = 0x4a0
> >    fault code               = supervisor read data, page not present
> >    instruction pointer      = 0x20:0xffffffff81a7f39f
> >    stack pointer            = 0x28:0xfffffe1834789570
> >    frame pointer            = 0x28:0xfffffe18347895b0
> >    code segment             = base 0x0, limit 0xfffff, type 0x1b
> >                              = DPL 0, pres 1, long 1, def32 0, gran 1
> >    processor eflags         = interrupt enabled, resume, IOPL = 0
> >    current process          = 1849 (txg_thread_enter)
> >    trap number              = 12
> >    panic: page fault
> >    cpuid = 7
> >    KDB: stack backtrace:
> >    #0 0xffffffff808e7dd0 at kdb_backtrace+0x60
> >    #1 0xffffffff808af8b5 at panic+0x155
> >    #2 0xffffffff80c8e629 at trap_fatal+0x3a2
> >    #3 0xffffffff80c8e969 at trap_pfault+0x2c9
> >    #4 0xffffffff80c8e0f6 at trap+0x5e6
> >    #5 0xffffffff80c75392 at calltrap+0x8
> >    #6 0xffffffff81a53b5a at dsl_dataset_block_kill+0x3a
> >    #7 0xffffffff81a50967 at dnode_sync+0x237
> >    #8 0xffffffff81a48fcb at dmu_objset_sync_dnodes+0x2b
> >    #9 0xffffffff81a48e4d at dmo_objset_sync+0x1ed
> >    #10 0xffffffff81a5d29a at dsl_pool_sync+0xca
> >    #11 0xffffffff81a78a4e at spa_sync+0x52e
> >    #12 0xffffffff81a81925 at txg_sync_thread+0x375
> >    #13 0xffffffff8088198a at fork_exit+0x9a
> >    #14 0xffffffff80c758ce at fork_trampoline+0xe
> >    uptime: 46s
> >    Automatic reboot in 15 seconds - press a key on the console to abort
> >
> This just happened again to another server. We upgraded two servers on the same morning, and now both of them exhibit this 
> corrupted zfs volume and panic behavior.
>
> Out of all the volumes, one of them is causing the panic, and the panic message is nearly identical.
>
> I have 4 snapshots over the last 24 hours, so hopefully a snapshot from noon today can be sent to a new volume ( zfs send | zfs 
> recv )
>
> I guess I can now rule out it being a hardware issue, this is clearly problem related to the upgrade (freebsd-update  was used). 
> I first thought the first system had a bad upgrade, perhaps a mix and match of 9.2 binaries running on a 10 kernel, but I used 
> the 'freebsd-update IDS' command to verify the integrity of the install, and it looked good, the only differences were config 
> files in /etc/ that we manage.
>

Do you have a kernel crash dump from this?

Also can you confirm if your amd64 or just i386?

    Regards
    Steve