Process in T state does not want to die.....
Willem Jan Withagen
wjw at digiware.nl
Fri Nov 29 09:08:45 UTC 2019
On 28-11-2019 22:46, Konstantin Belousov wrote:
> On Thu, Nov 28, 2019 at 09:52:50PM +0100, Willem Jan Withagen wrote:
>> # ps -o pid,lwp,flags,flags2,state,tracer,command -p 3532
>> PID LWP F F2 STAT TRACER COMMAND
>> 3532 103955 11080081 00000000 TsJ 0 ceph-osd -i 5
>>
>> # procstat -kk 3532
>> PID TID COMM TDNAME KSTACK
.......
>> 3532 104829 ceph-osd filestore_sync mi_switch+0xe2
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>> 3532 104830 ceph-osd journal_write mi_switch+0xe2
>> sleepq_wait+0x2c _sleep+0x247 bwillwrite+0x97 dofilewrite+0x93
>> sys_writev+0x6e amd64_syscall+0x364 fast_syscall_common+0x101
>> 3532 104831 ceph-osd fn_jrn_objstore mi_switch+0xe2
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>> 3532 104832 ceph-osd tp_fstore_op mi_switch+0xe2
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>> 3532 104833 ceph-osd tp_fstore_op mi_switch+0xe2
>> sleepq_wait+0x2c _sleep+0x247 bwillwrite+0x97 vn_open_cred+0xc8
>> zfs_setextattr+0x216 VOP_SETEXTATTR_APV+0x7c extattr_set_vp+0x11d
>> sys_extattr_set_fd+0xee amd64_syscall+0x364 fast_syscall_common+0x101
> This is an example of the cause for your problem.
>
> The thread is executing some ZFS code, zfs_setextattr() VOP probably to
> do something with the ext attrs. There, it recurses into VFS to open a
> file, and vn_open_cred() waits for buffer space pressure because it is
> assumed the vn_open_cred() is called from top level, not from inside
> VFS/fs code.
>
> Until this thread finished its operation and safely returned back to
> kernel/user boundary, the process cannot exit.
> There are two problems. One is this call to bwillwrite(), and it is easy
> to get rid of it, see the patch at the end of the message. But I wonder
> why do you have so many dirty buffers and why it does not resolve itself.
> Note that ZFS does not use buffer cache, you must have some other very
> active fs, using buffer cache, that is somehow blocked on writes.
Oke,
Thanx for the analysis. I'll try the patch..
I think the use of the buffer cache comes from bonnie++ test that is
hammering the UFS filesystem
that is mounted on a ceph rbd-ggate device. rbd-ggate uses geom-gate to
offer a disk device
that is backed by an rbd-image in the ceph cluster. And some of the
nodes in the cluster run
on the same node as the test, so there is a lot of ZFS activity as well.
Likely this server's memory is a bit small for the load thrown at it,
but atm. I do not have more
beefy hardware.
Bonnie is actually the only way thus far to get this type of problems...
This would probably also explain why this problem does not occur when
using small testsizes
in bonnie: the memory pressure does not get critical.
--WjW
More information about the freebsd-hackers
mailing list