kern/176636: Periodical crashes with 9.1-R

Sun Mar 10 11:50:04 UTC 2013

The following reply was made to PR kern/176636; it has been noted by GNATS.

From: Rasmus Skaarup <freebsd at gal.dk>
To: Andriy Gapon <avg at FreeBSD.org>
Cc: bug-followup at FreeBSD.org
Subject: Re: kern/176636: Periodical crashes with 9.1-R
Date: Sun, 10 Mar 2013 12:45:30 +0100

 I am deciding whether to do this or not, the users had been impacted for =
 a couple of days, and I wouldn't become very popular if I introduce the =
 same errors again.

 I did a 'zfs scrub' after migrating to a healthy system, but apparently =
 that didn't fix the corruption.

 Br
 Rasmus Skaarup

 On 10/03/2013, at 12.01, Andriy Gapon <avg at FreeBSD.org> wrote:

 > on 07/03/2013 07:00 Rasmus Skaarup said the following:
 >>=20
 >> This is the only kind of panic I get - after your patch:
 >>=20
 >> Fatal trap 12: page fault while in kernel mode
 >> cpuid =3D 1; apic id =3D 01
 >> fault virtual address   =3D 0x60
 >> fault code              =3D supervisor read data, page not present
 >> instruction pointer     =3D 0x20:0xffffffff8162e4f0
 >> stack pointer           =3D 0x28:0xffffff81624726e0
 >> frame pointer           =3D 0x28:0xffffff81624727d0
 >> code segment            =3D base 0x0, limit 0xfffff, type 0x1b
 >>                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
 >> processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
 >> current process         =3D 26068 (zpool)
 >> trap number             =3D 12
 >> panic: page fault
 >> cpuid =3D 1
 >> KDB: stack backtrace:
 >> #0 0xffffffff809208a6 at kdb_backtrace+0x66
 >> #1 0xffffffff808ea8be at panic+0x1ce
 >> #2 0xffffffff80bd8240 at trap_fatal+0x290
 >> #3 0xffffffff80bd857d at trap_pfault+0x1ed
 >> #4 0xffffffff80bd8b9e at trap+0x3ce
 >> #5 0xffffffff80bc315f at calltrap+0x8
 >> #6 0xffffffff81673975 at sa_handle_get_from_db+0x95
 >> #7 0xffffffff81673a38 at sa_handle_get+0x48
 >> #8 0xffffffff8169f516 at zfs_grab_sa_handle+0x96
 >> #9 0xffffffff8169faca at zfs_obj_to_path+0x6a
 >> #10 0xffffffff816b8c75 at zfs_ioc_obj_to_path+0x75
 >> #11 0xffffffff816bad46 at zfsdev_ioctl+0xe6
 >> #12 0xffffffff807db28b at devfs_ioctl_f+0x7b
 >> #13 0xffffffff80932325 at kern_ioctl+0x115
 >> #14 0xffffffff8093255d at sys_ioctl+0xfd
 >> #15 0xffffffff80bd7ae6 at amd64_syscall+0x546
 >> #16 0xffffffff80bc3447 at Xfast_syscall+0xf7
 >=20
 > It is possible that while there were the memory corruptions (either =
 because of
 > the bug for which I sent you the patch or for some other reason), some =
 bad /
 > corrupted ZFS metadata was written to the stable storage.  Now that =
 corrupted
 > data could be causing further panics.  It would be interesting to =
 re-create a
 > pool from scratch and see how that behaves.  If you do that, please =
 use the patch.
 >=20
 > --=20
 > Andriy Gapon
 >=20