svn commit: r320156 - in head: cddl/contrib/opensolaris/cmd/zdb cddl/contrib/opensolaris/cmd/ztest cddl/contrib/opensolaris/lib/libzfs/common sys/cddl/contrib/opensolaris/common/zfs sys/cddl/contri...

O. Hartmann ohartmann at walstatt.org
Wed Jun 21 07:59:24 UTC 2017


Am Tue, 20 Jun 2017 17:25:53 -0400
"Kenneth D. Merry" <ken at FreeBSD.ORG> schrieb:

> On Tue, Jun 20, 2017 at 23:37:10 +0300, Andriy Gapon wrote:
> > On 20/06/2017 23:29, Ken Merry wrote:  
> > > I don???t know for sure that this commit is the cause, but it (and r320153) are the
> > > only ZFS commits between a version of head from June 14th that boots off a ZFS
> > > mirror, and one that panics.

r320153 is running well here and stable, but with r320156, my kernel(s) on all ZFS
machines panic immediately (they have ZFS built in into the kernel, not a module).

This moment, I went back to r320153. I'm sorry for not having debugging informations, the
boxes are w/o debugging options this moment.

> > > 
> > > Here???s the stack trace:
> > > 
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid = 22; 
> > > 
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid = 9; apic id = 09
> > > fault virtual address   = 0x0
> > > fault code              = supervisor read data, page not present
> > > instruction pointer     = 0x20:0xffffffff81e47f21
> > > stack pointer           = 0x28:0xfffffe08b37f8810
> > > frame pointer           = 0x28:0xfffffe08b37f8860
> > > code segment            = base 0x0, limit 0xfffff, type 0x1b
> > >                         = DPL 0, pres 1, long 1, def32 0, gran 1
> > > processor eflags        = interrupt enabled, resume, IOPL = 0
> > > current process         = 0 (zio_free_issue_0_3)
> > > [ thread pid 0 tid 100478 ]
> > > Stopped at      0xffffffff81e47f21 = zio_vdev_io_start+0x1f1:   testb
> > > $0x1,(%rax)  
> > > db> bt  
> > > Tracing pid 0 tid 100478 td 0xfffff80193156000
> > > zio_vdev_io_start() at 0xffffffff81e47f21 = zio_vdev_io_start+0x1f1/frame
> > > 0xfffffe08b37f8860 zio_execute() at 0xffffffff81e4312c = zio_execute+0x36c/frame
> > > 0xfffffe08b37f88b0 zio_nowait() at 0xffffffff81e422b8 = zio_nowait+0xb8/frame
> > > 0xfffffe08b37f88e0 vdev_mirror_io_start() at 0xffffffff81e224fc =
> > > vdev_mirror_io_start+0x38c/frame 0xfffffe08b37f8930 zio_vdev_io_start() at
> > > 0xffffffff81e48030 = zio_vdev_io_start+0x300/frame 0xfffffe08b37f8990 zio_execute()
> > > at 0xffffffff81e4312c = zio_execute+0x36c/frame 0xfffffe08b37f89e0
> > > taskqueue_run_locked() at 0xffffffff809a9d6d = taskqueue_run_locked+0x13d/frame
> > > 0xfffffe08b37f8a40 taskqueue_thread_loop() at 0xffffffff809aab28 =
> > > taskqueue_thread_loop+0x88/frame 0xfffffe08b37f8a70 fork_exit() at
> > > 0xffffffff8091e3e4 = fork_exit+0x84/frame 0xfffffe08b37f8ab0 fork_trampoline() at
> > > 0xffffffff80d930fe = fork_trampoline+0xe/frame 0xfffffe08b37f8ab0 --- trap 0, rip =
> > > 0, rsp = 0, rbp = 0 ---  
> > > db>   
> > > 
> > > (kgdb) list *(zio_vdev_io_start+0x1f1)
> > > 0xd9f21 is in zio_vdev_io_start
> > > (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:350).
> > > 345 346             /*
> > > 347              * Ensure that anyone expecting this zio to contain a linear ABD
> > > isn't 348              * going to get a nasty surprise when they try to access the
> > > data. 349              */
> > > 350             IMPLY(abd_is_linear(zio->io_abd), abd_is_linear(data));
> > > 351
> > > 352             zt->zt_orig_abd = zio->io_abd;
> > > 353             zt->zt_orig_size = zio->io_size;
> > > 354             zt->zt_bufsize = bufsize;
> > > 
> > > I???ll try rebooting and see if the problem goes away.  If not, I???ll roll back
> > > the ABD change and see if the problem goes away.  
> > 
> > Judging from the thread that panic-ed the problem may have to do with our TRIM
> > support.  Unfortunately,  I didn't have a chance to test the change on a system
> > with working TRIM and, so, I missed it.
> > I will look into this further, but it's almost obvious that the problem is
> > caused by zio->io_abd being NULL for a zio of type ZIO_TYPE_FREE.  
> 
> FWIW, avg sent me a patch for this particular problem (by checking for NULL
> before dereferencing the pointer), and although it got me past the above
> problem, I hit another related panic:
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 6; 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 14; apic id = 22
> fault virtual address   = 0x4
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff81d92a2d
> stack pointer           = 0x0:0xfffffe08b36e0710
> frame pointer           = 0x0:0xfffffe08b36e0730
> code segment            = base 0x0, limit 0xfffff, type 0x1b
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 11; apic id = 0b
> fault virtual address   = 0x4
> Fatal trap 12: page fault while in kernel mode
> cpuid = 8; apic id = 08
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 0 (zio_free_issue_4_1)
> [ thread pid 0 tid 100799 ]
> Stopped at      0xffffffff81d92a2d = abd_verify+0xd:    movl    0x4(%r14),%eax
> db> bt  
> Tracing pid 0 tid 100799 td 0xfffff801931b8560
> abd_verify() at 0xffffffff81d92a2d = abd_verify+0xd/frame 0xfffffe08b36e0730
> abd_put() at 0xffffffff81d92eff = abd_put+0xf/frame 0xfffffe08b36e0750
> vdev_raidz_map_free() at 0xffffffff81e26312 = vdev_raidz_map_free+0x82/frame
> 0xfffffe08b36e0780 zio_vdev_io_assess() at 0xffffffff81e48646 =
> zio_vdev_io_assess+0x116/frame 0xfffffe08b36e07b0 zio_execute() at 0xffffffff81e4312c =
> zio_execute+0x36c/frame 0xfffffe08b36e0800 zio_vdev_io_start() at 0xffffffff81e48184 =
> zio_vdev_io_start+0x454/frame 0xfffffe08b36e0860 zio_execute() at 0xffffffff81e4312c =
> zio_execute+0x36c/frame 0xfffffe08b36e08b0 zio_nowait() at 0xffffffff81e422b8 =
> zio_nowait+0xb8/frame 0xfffffe08b36e08e0 vdev_mirror_io_start() at 0xffffffff81e224fc =
> vdev_mirror_io_start+0x38c/frame 0xfffffe08b36e0930 zio_vdev_io_start() at
> 0xffffffff81e48030 = zio_vdev_io_start+0x300/frame 0xfffffe08b36e0990 zio_execute() at
> 0xffffffff81e4312c = zio_execute+0x36c/frame 0xfffffe08b36e09e0 taskqueue_run_locked()
> at 0xffffffff809a9d6d = taskqueue_run_locked+0x13d/frame 0xfffffe08b36e0a40
> taskqueue_thread_loop() at 0xffffffff809aab28 = taskqueue_thread_loop+0x88/frame
> 0xfffffe08b36e0a70 fork_exit() at 0xffffffff8091e3e4 = fork_exit+0x84/frame
> 0xfffffe08b36e0ab0 fork_trampoline() at 0xffffffff80d930fe = fork_trampoline+0xe/frame
> 0xfffffe08b36e0ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> db>   
> 
> (kgdb) list *(abd_verify+0xd)
> 
> 0x24a2d is in abd_verify
> (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c:231).
> 226     } 227
> 228     static inline void
> 229     abd_verify(abd_t *abd)
> 230     {
> 231             ASSERT3U(abd->abd_size, >, 0);
> 232             ASSERT3U(abd->abd_size, <=, SPA_MAXBLOCKSIZE);
> 233             ASSERT3U(abd->abd_flags, ==, abd->abd_flags & (ABD_FLAG_LINEAR |
> 234                 ABD_FLAG_OWNER | ABD_FLAG_META));
> 235             IMPLY(abd->abd_parent != NULL, !(abd->abd_flags & ABD_FLAG_OWNER));
> (kgdb) list *(abd_put+0xf)
> 0x24eff is in abd_put
> (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c:514).
> 509      */ 510     void
> 511     abd_put(abd_t *abd)
> 512     {
> 513             abd_verify(abd);
> 514             ASSERT(!(abd->abd_flags & ABD_FLAG_OWNER));
> 515
> 516             if (abd->abd_parent != NULL) {
> 517                     (void) refcount_remove_many(&abd->abd_parent->abd_children,
> 518                         abd->abd_size, abd);
> (kgdb) list *(vdev_raidz_map_free+0x82)
> 0xb8312 is in vdev_raidz_map_free
> (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c:281).
> 276                             zio_buf_free(rm->rm_col[c].rc_gdata,
> 277                                 rm->rm_col[c].rc_size); 278             }
> 279
> 280             size = 0;
> 281             for (c = rm->rm_firstdatacol; c < rm->rm_cols; c++) {
> 282                     abd_put(rm->rm_col[c].rc_abd);
> 283                     size += rm->rm_col[c].rc_size;
> 284             }
> 285
> (kgdb) list *(zio_vdev_io_assess+0x116)
> 0xda646 is in zio_vdev_io_assess
> (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3315).
> 3310            if (vd == NULL && !(zio->io_flags & ZIO_FLAG_CONFIG_WRITER))
> 3311                    spa_config_exit(zio->io_spa, SCL_ZIO, zio); 3312
> 3313            if (zio->io_vsd != NULL) {
> 3314                    zio->io_vsd_ops->vsd_free(zio);
> 3315                    zio->io_vsd = NULL;
> 3316            }
> 3317
> 3318            if (zio_injection_enabled && zio->io_error == 0)
> 3319                    zio->io_error = zio_handle_fault_injection(zio, EIO);
> (kgdb) 
> 
> So, I disabled trim by setting vfs.zfs.trim.enabled=0 in the loader, and I
> can boot now.
> 
> Ken



-- 
O. Hartmann

Ich widerspreche der Nutzung oder Übermittlung meiner Daten für
Werbezwecke oder für die Markt- oder Meinungsforschung (§ 28 Abs. 4 BDSG).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 313 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/svn-src-all/attachments/20170621/8b180ec3/attachment.sig>


More information about the svn-src-all mailing list