I have a DDB session open to a crashed ZFS server
Dennis Glatting
dg17 at penx.com
Tue Oct 16 15:16:46 UTC 2012
On Tue, 2012-10-16 at 08:44 -0400, John Baldwin wrote:
> On Monday, October 15, 2012 12:03:39 pm Dennis Glatting wrote:
> > FreeBSD/amd64 (mc) (ttyu0)
> >
> > login: NMI ... going to debugger
> > [ thread pid 11 tid 100003 ]
>
> You got an NMI, not a crash. What happens if you just continue ('c' command)
> from DDB?
>
I hit the NMI button because of the "crash," which is a misword, to get
into DDB.
The problem I am having with ZFS where the file systems go dorment under
load within 24 hours. Specifically, the processes are still alive, stuck
in disk wait, and there is no disk I/O. I have this problem across four
machines for months.
The network and console still work but if I enter a command requiring a
pull from disk, nothing comes back.
> I have heard of machines sending spurious NMIs in the past. If that is what
> you are seeing, there is a sysctl to disable dropping into DDB due to an NMI:
>
> machdep.kdb_on_nmi: 1
>
> If you keep getting NMIs, try setting that to 0.
>
The DDB session is still open but I don't see why this system is stuck.
I have been looking at locked processes (two below) but I'm not familiar
with the code. Maybe a deadlock? Maybe a missed interrupt? Maybe an
unsupported controller? I dunno.
0xfffffe0b989803f0: 0xfffffe0b989803f0: tag zfs, type VDIR
tag zfs, type VDIR
usecount 0, writecount 0, refcount 2 mountedhere 0
usecount 0, writecount 0, refcount 2 mountedhere 0
flags (VI_DOINGINACT|VI(0x200))
flags (VI_DOINGINACT|VI(0x200))
v_object 0xfffffe0b87af7488 ref 0 pages 0
v_object 0xfffffe0b87af7488 ref 0 pages 0
lock type zfs: EXCL by thread 0xfffffe09b48fe900 (pid 70646)
lock type zfs: EXCL by thread 0xfffffe09b48fe900 (pid 70646)
db> show thread 0xfffffe09b48fe900
Thread 104609 at 0xfffffe09b48fe900:
proc (pid 70646): 0xfffffe0af7e79940
name: find
stack: 0xffffffa3329b2000-0xffffffa3329b5fff
flags: 0x4 pflags: 0
state: INHIBITED: {SLEEPING}
wmesg: tx->tx_quiesce_done_cv) wchan: 0xfffffe0059b60240
priority: 120
container lock: sleepq chain (0xffffffff8126d498)
db> sh proc 70646
Process 70646 (find) at 0xfffffe0af7e79940:
state: NORMAL
uid: 0 gids: 0, 5
parent: pid 70645 at 0xfffffe006e4974a0
ABI: FreeBSD ELF64
arguments: find
threads: 1
104609 D tx->tx_q 0xfffffe0059b60240 find
db> tr 104609
Tracing pid 70646 tid 104609 td 0xfffffe09b48fe900
sched_switch() at sched_switch+0x28b
mi_switch() at mi_switch+0xdf
sleepq_wait() at sleepq_wait+0x3a
_cv_wait() at _cv_wait+0x164
txg_wait_open() at txg_wait_open+0x85
dmu_tx_assign() at dmu_tx_assign+0x38
zfs_inactive() at zfs_inactive+0x8e
zfs_freebsd_inactive() at zfs_freebsd_inactive+0xd
VOP_INACTIVE_APV() at VOP_INACTIVE_APV+0x5d
vinactive() at vinactive+0xef
vputx() at vputx+0x244
sys_fchdir() at sys_fchdir+0x3f0
amd64_syscall() at amd64_syscall+0x334
Xfast_syscall() at Xfast_syscall+0xfb
--- syscall (13, FreeBSD ELF64, sys_fchdir), rip = 0x80088396c, rsp =
0x7fffffffd998, rbp = 0x7fffffffda40 ---
0xfffffe005c8855e8: 0xfffffe005c8855e8: tag syncer, type VNON
tag syncer, type VNON
usecount 1, writecount 0, refcount 2 mountedhere 0
usecount 1, writecount 0, refcount 2 mountedhere 0
flags (VI(0x200))
flags (VI(0x200))
lock type syncer: EXCL by thread 0xfffffe0039147480 (pid 20)
lock type syncer: EXCL by thread 0xfffffe0039147480 (pid 20)
db> sh thread 0xfffffe0039147480
Thread 100242 at 0xfffffe0039147480:
proc (pid 20): 0xfffffe003f4bc940
name: syncer
stack: 0xffffffa2fc73b000-0xffffffa2fc73efff
flags: 0x4 pflags: 0x240800
state: INHIBITED: {SLEEPING}
wmesg: zio->io_cv) wchan: 0xfffffe01187c5320
priority: 116
container lock: sleepq chain (0xffffffff8126e140)
db> sh proc 20
Process 20 (syncer) at 0xfffffe003f4bc940:
state: NORMAL
uid: 0 gids: 0
parent: pid 0 at 0xffffffff812cbdd8
ABI: null
threads: 1
100242 D zio->io_ 0xfffffe01187c5320 [syncer]
db> tr 100242
Tracing pid 20 tid 100242 td 0xfffffe0039147480
sched_switch() at sched_switch+0x28b
mi_switch() at mi_switch+0xdf
sleepq_wait() at sleepq_wait+0x3a
_cv_wait() at _cv_wait+0x164
zio_wait() at zio_wait+0x5b
zil_commit() at zil_commit+0x833
zfs_sync() at zfs_sync+0xaa
sync_fsync() at sync_fsync+0x168
VOP_FSYNC_APV() at VOP_FSYNC_APV+0x5d
sync_vnode() at sync_vnode+0x1b0
sched_sync() at sched_sync+0x29f
fork_exit() at fork_exit+0x9a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffa2fc73eb30, rbp = 0 ---
More information about the freebsd-fs
mailing list