kern/145339: [zfs] deadlock after detaching block device from raidz pool

Alex Bakhtin alex.bakhtin at gmail.com
Mon May 3 23:30:04 UTC 2010


The following reply was made to PR kern/145339; it has been noted by GNATS.

From: Alex Bakhtin <alex.bakhtin at gmail.com>
To: Andriy Gapon <avg at icyb.net.ua>
Cc: bug-followup at freebsd.org, Pawel Jakub Dawidek <pjd at freebsd.org>
Subject: Re: kern/145339: [zfs] deadlock after detaching block device from 
	raidz pool
Date: Tue, 4 May 2010 03:23:35 +0400

 Andriy,
 
      Upgraded to today's stable. Reproduced the problem. On GENERIC
 the system just hangs with the following output:
 
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D
 ad12: FAILURE - WRITE_DMA48
 status=3D7f<READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR> error=3D0
 LBA=3D2312588250^
 
 
 
 Fatal trap 12: page fault while in kernel mode
 cpuid =3D 1; apic id =3D 01
 fault virtual address   =3D 0x48
 fault code              =3D supervisor write data, page not present
 instruction pointer     =3D 0x20:0xffffffff80593e95
 stack pointer           =3D 0x28:0xffffff8000065ba0
 frame pointer           =3D 0x28:0xffffff8000065bb0
 code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                         =3D DPL 0, pres 1, long 1, def32 0, gran 1
 processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
 current process         =3D 3 (g_up)
 trap number             =3D 12
 panic: page fault
 cpuid =3D 1
 
 Fatal trap 12: page fault while in kernel mode
 cpuid =3D 0; apic id =3D 00
 fault virtual address   =3D 0x0
 fault code              =3D supervisor read data, page not present
 instruction pointer     =3D 0x20:0xffffffff80545a28
 stack pointer           =3D 0x28:0xffffff80eada2a40
 frame pointer           =3D 0x28:0xffffff80eada2a90
 code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                         =3D DPL 0, pres 1, long 1, def32 0, gran 1
 processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
 current process         =3D 0 (spa_zio)
 trap number             =3D 12
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D
 
 
 With GENERIG + DDB/KDB enabled I got the following (it seems that
 first time I detached the device when there was no active transaction
 - can try to reproduce):
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D
 ad12: FAILURE - device detached
 
 
 Fatal trap 12: page fault while in kernel mode
 cpuid =3D 1; apic id =3D 01
 fault virtual address   =3D 0x48
 fault code              =3D supervisor write data, page not present
 
 instruction pointer     =3D 0x20:0xffffffff805a0345
 Fatal double fault
 stack pointer           =3D 0x28:0xffffff800006aba0
 rip =3D 0xffffffff808085ad
 frame pointer           =3D 0x28:0xffffff800006abb0
 rsp =3D 0xffffff80ead87000
 code segment            =3D base 0x0, limit 0xfffff, type 0x1b
 rbp =3D 0xffffff80ead87070
                         =3D DPL 0, pres 1, long 1, def32 0, gran 1
 cpuid =3D 0; processor eflags     =3D apic id =3D 00
 interrupt enabled, panic: double fault
 resume, cpuid =3D 0
 IOPL =3D 0
 KDB: enter: panic
 c[thread pid 0 tid 100113 ]
 Stopped at      kdb_enter+0x3d: movq    $0,0x69cee0(%rip)
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D
 
 And another one
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D
 ad12: FAILURE - WRITE_DMA
 status=3D7f<READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR> error=3D0
 LBA=3D111033498^M
 ^M
 ^M
 Fatal trap 12: page fault while in kernel mode^M
 cpuid =3D 1; apic id =3D 01^M
 fault virtual address   =3D 0x48^M
 fault code              =3D supervisor write data, page not present^M
 instruction pointer     =3D 0x20:0xffffffff805a0345^M
 stack pointer           =3D 0x28:0xffffff800006aba0^M
 frame pointer           =3D 0x28:0xffffff800006abb0^M
 code segment            =3D base 0x0, limit 0xfffff, type 0x1b^M
                         =3D DPL 0, pres 1, long 1, def32 0, gran 1^M
 processor eflags        =3D interrupt enabled, resume, IOPL =3D 0^M
 current process         =3D 3 (g_up)^M
 [thread pid 3 tid 100011 ]
 Stopped at      _mtx_lock_flags+0x15:   lock cmpxchgq   %rsi,0x18(%rdi)
 db:0:kdb.enter.default> capture on
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D
 
 And with your patch the system doesn't detect that device is detached
 and seems to be dead-locked (doesn't respond to power-button):
 
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D
 acpi0: suspend request ignored (not ready yet)
 acpi0: request to enter state S5 failed (err 6)
 acpi0: suspend request ignored (not ready yet)
 acpi0: request to enter state S5 failed (err 6)
 
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D
 
     So, I can still easily reproduce this problem on 8-STABLE. Your
 simple patch helps to avoid page fault but dead-locks the system. Are
 you sure that you can just return at this point? Probably it make
 sense to set some error flag before return?
 
 Alex Bakhtin
 
 2010/4/23 Andriy Gapon <avg at icyb.net.ua>:
 >
 > Can you try this patch?
 >
 > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
 > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
 > @@ -603,6 +603,9 @@ vdev_geom_io_intr(struct bio *bp)
 > =A0 =A0 =A0 =A0zio =3D bp->bio_caller1;
 > =A0 =A0 =A0 =A0ctx =3D zio->io_vd->vdev_tsd;
 >
 > + =A0 =A0 =A0 if (ctx =3D=3D NULL)
 > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
 > +
 > =A0 =A0 =A0 =A0if ((zio->io_error =3D bp->bio_error) =3D=3D 0 && bp->bio_=
 resid !=3D 0)
 > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0zio->io_error =3D EIO;
 >
 >
 > --
 > Andriy Gapon
 >


More information about the freebsd-fs mailing list