Process enters unkillable state and somewhat wedges zfs

Daniel Andersen dea at caida.org
Fri Oct 17 21:26:10 UTC 2014


My apologies on delays in responding to this.  I guess all the traces triggered my spam filters.  :(

While we rebooted the below mentioned machine just yesterday, we just had an identical event on a completely
different machine.  Yes, we are making heavy use of nullfs and nfs as a means of presenting identical directory
structures across multiple machines.

I just ran 'ktrace -p <pid>' on the second machine, with <pid> matching the process that is in permanent 100% R state
( again, it is sudo )  ktrace.out came out as an empty file.  I did run on a few random processes to make sure ktrace
actually works on the system, and it did.

For the record, if I didn't say this before:  This is on a FreeBSD 10.0/amd64 machine.

Reading the email thread you linked.. It could be the same thing.  I'm not entirely sure what counts as a true deadlock,
and the thread doesn't mention if one process was stuck in R state but still unkillable.

Now that it is confirmed that this thing hits us on two different machines, I'm considering rearranging things without
as much nullfs in an attempt to reduce our downtime.

Dan

On 09/27/2014 05:46 AM, John Baldwin wrote:
> On Friday, September 26, 2014 02:47:58 PM Daniel Andersen wrote:
>> Okay, we were finally able to collect a trace on this.  I took two, just in
>> case.  basically did a tr <pid>, continue, go back into ddb and did another.
>>
>> Tracing pid 89483 tid 101168 td 0xfffff8048e301920
>> cpustop_handler() at cpustop_handler+0x28/frame 0xfffffe2f395e1ce0
>> ipi_nmi_handler() at ipi_nmi_handler+0x3f/frame 0xfffffe2f395e1d00
>> trap() at trap+0x42/frame 0xfffffe2f395e1f20
>> nmi_calltrap() at nmi_calltrap+0x8/frame 0xfffffe2f395e1f20
>> --- trap 0x13, rip = 0xffffffff808aada0, rsp = 0xfffffe2f395e1fe0, rbp = 0xfffffe2ffca2bf60 ---
>> __lockmgr_args() at __lockmgr_args+0x30/frame 0xfffffe2ffca2bf60
>> vop_stdlock() at vop_stdlock+0x3c/frame 0xfffffe2ffca2bf80
>> VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9d/frame 0xfffffe2ffca2bfb0
>> _vn_lock() at _vn_lock+0x43/frame 0xfffffe2ffca2c010
>> zfs_lookup() at zfs_lookup+0x3a1/frame 0xfffffe2ffca2c0a0
>> zfs_freebsd_lookup() at zfs_freebsd_lookup+0x6d/frame 0xfffffe2ffca2c1e0
>> VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0x92/frame 0xfffffe2ffca2c210
>> vfs_cache_lookup() at vfs_cache_lookup+0xcf/frame 0xfffffe2ffca2c260
>> VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x92/frame 0xfffffe2ffca2c290
>> null_lookup() at null_lookup+0x8b/frame 0xfffffe2ffca2c300
>> VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x92/frame 0xfffffe2ffca2c330
>> lookup() at lookup+0x58b/frame 0xfffffe2ffca2c3b0
>> namei() at namei+0x504/frame 0xfffffe2ffca2c480
>> vn_open_cred() at vn_open_cred+0x232/frame 0xfffffe2ffca2c5d0
>> vop_stdvptocnp() at vop_stdvptocnp+0x17d/frame 0xfffffe2ffca2c910
>> null_vptocnp() at null_vptocnp+0x2b/frame 0xfffffe2ffca2c970
>> VOP_VPTOCNP_APV() at VOP_VPTOCNP_APV+0x98/frame 0xfffffe2ffca2c9a0
>> vn_vptocnp_locked() at vn_vptocnp_locked+0x113/frame 0xfffffe2ffca2ca20
>> vn_fullpath1() at vn_fullpath1+0x1b2/frame 0xfffffe2ffca2ca80
>> kern___getcwd() at kern___getcwd+0x115/frame 0xfffffe2ffca2cae0
>> amd64_syscall() at amd64_syscall+0x357/frame 0xfffffe2ffca2cbf0
>> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe2ffca2cbf0
>> --- syscall (326, FreeBSD ELF64, sys___getcwd), rip = 0x2000cd8cda, rsp = 0x7fffffffd358, rbp = 0x7fffffffd440 ---
>>
>>
>>
>> Tracing pid 89483 tid 101168 td 0xfffff8048e301920
>> cpustop_handler() at cpustop_handler+0x28/frame 0xfffffe2f395e1ce0
>> ipi_nmi_handler() at ipi_nmi_handler+0x3f/frame 0xfffffe2f395e1d00
>> trap() at trap+0x42/frame 0xfffffe2f395e1f20
>> nmi_calltrap() at nmi_calltrap+0x8/frame 0xfffffe2f395e1f20
>> --- trap 0x13, rip = 0xffffffff80e6110e, rsp = 0xfffffe2f395e1fe0, rbp = 0xfffffe2ffca2bfb0 ---
>> VOP_LOCK1_APV() at VOP_LOCK1_APV+0xae/frame 0xfffffe2ffca2bfb0
>> _vn_lock() at _vn_lock+0x43/frame 0xfffffe2ffca2c010
>> zfs_lookup() at zfs_lookup+0x3a1/frame 0xfffffe2ffca2c0a0
>> zfs_freebsd_lookup() at zfs_freebsd_lookup+0x6d/frame 0xfffffe2ffca2c1e0
>> VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0x92/frame 0xfffffe2ffca2c210
>> vfs_cache_lookup() at vfs_cache_lookup+0xcf/frame 0xfffffe2ffca2c260
>> VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x92/frame 0xfffffe2ffca2c290
>> null_lookup() at null_lookup+0x8b/frame 0xfffffe2ffca2c300
>> VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x92/frame 0xfffffe2ffca2c330
>> lookup() at lookup+0x58b/frame 0xfffffe2ffca2c3b0
>> namei() at namei+0x504/frame 0xfffffe2ffca2c480
>> vn_open_cred() at vn_open_cred+0x232/frame 0xfffffe2ffca2c5d0
>> vop_stdvptocnp() at vop_stdvptocnp+0x17d/frame 0xfffffe2ffca2c910
>> null_vptocnp() at null_vptocnp+0x2b/frame 0xfffffe2ffca2c970
>> VOP_VPTOCNP_APV() at VOP_VPTOCNP_APV+0x98/frame 0xfffffe2ffca2c9a0
>> vn_vptocnp_locked() at vn_vptocnp_locked+0x113/frame 0xfffffe2ffca2ca20
>> vn_fullpath1() at vn_fullpath1+0x1b2/frame 0xfffffe2ffca2ca80
>> kern___getcwd() at kern___getcwd+0x115/frame 0xfffffe2ffca2cae0
>> amd64_syscall() at amd64_syscall+0x357/frame 0xfffffe2ffca2cbf0
>> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe2ffca2cbf0
>> --- syscall (326, FreeBSD ELF64, sys___getcwd), rip = 0x2000cd8cda, rsp = 0x7fffffffd358, rbp = 0x7fffffffd440 ---
> 
> Hmm, if you are still in this state, can you do a ktrace of this running
> process and see if it is logging any syscall events (just want to make sure it
> is stuck in the kernel rather than constantly calling system calls due to a
> loop in userland)?
> 
> Also, do you have a nullfs mount of a zfs path?  (It looks that way from your
> stack trace).  I did find this thread:
> 
> https://www.mail-archive.com/freebsd-current@freebsd.org/msg147678.html
> 
> Andriy (cc'd) probably has better insight into this than I.
> 
>> On 08/29/2014 11:24 AM, John Baldwin wrote:
>>> On Wednesday, August 27, 2014 02:24:47 PM Daniel Andersen wrote:
>>>> On 08/27/2014 01:39 PM, John Baldwin wrote:
>>>>> These are all blocked in "zfs" then.  (For future reference, the 'mwchan'
>>>>> field that you see as 'STATE' in top or via 'ps O mwchan' is more detailed
>>>>> than the 'D' state.)
>>>>>
>>>>> To diagnose this further, you would need to see which thread holds the
>>>>> ZFS vnode lock these threads need.  I have some gdb scripts you can use to
>>>>> do that at www.freebsd.org/~jhb/gdb/.  You would want to download 'gdb6*'
>>>>> files from there and then do this as root:
>>>>>
>>>>> # cd /path/to/gdb/files
>>>>> # kgdb
>>>>> (kgdb) source gdb6
>>>>> (kgdb) sleepchain 42335
>>>>>
>>>>> Where '42335' is the pid of some process stuck in "zfs".
>>>>
>>>> I will keep this in mind the next time the machine wedges.  Another data
>>>> point: the second procstat output I sent was the most recent.  All the
>>>> processes listed there were after the fact.  The process that started the
>>>> entire problem ( this time ) was sudo, and it only has this one entry in
>>>> procstat:
>>>>
>>>> 38003 102797 sudo             -                <running>
>>>>
>>>> Of note, this does not appear to be blocked on zfs in anyway.  'ps' showed
>>>> it in 'R' state instead of 'D' ( I will be sure to use mwchan in the
>>>> future. ) It appeared to be pegging an entire CPU core at 100% usage, as
>>>> well, and was only single threaded.
>>>
>>> Well, if it is spinning in some sort of loop in the kernel while holding a
>>> ZFS vnode lock that could be blocking all the other threads.  In that case,
>>> you don't need to do what I asked for above.  Instead, we need to find out
>>> what that thread is doing.  There are two ways of doing this.  One is to
>>> force a panic via 'sysctl debug.kdb.panic=1' and then use kgdb on the
>>> crashdump to determine what the running thread is doing.  Another option
>>> is to break into the DDB debugger on the console (note that you will need
>>> to build a custom kernel with DDB if you are on stable) and request a
>>> stack trace of the running process via 'tr <pid>'.  Ideally you can do this
>>> over a serial console so you can just cut and paste the output of the trace
>>> into a mail.  Over a video console you can either transcribe it by hand or
>>> take photos.
> 



More information about the freebsd-fs mailing list