ZFS txg implementation flaw
Slawa Olhovchenkov
slw at zxy.spb.ru
Mon Oct 28 22:24:54 UTC 2013
On Mon, Oct 28, 2013 at 02:56:17PM -0700, Xin Li wrote:
> >>> Semi-indirect. dtrace -n 'fbt:kernel:vm_object_terminate:entry
> >>> { @traces[stack()] = count(); }'
> >>>
> >>> After some (2-3) seconds
> >>>
> >>> kernel`vnode_destroy_vobject+0xb9
> >>> zfs.ko`zfs_freebsd_reclaim+0x2e kernel`VOP_RECLAIM_APV+0x78
> >>> kernel`vgonel+0x134 kernel`vnlru_free+0x362
> >>> kernel`vnlru_proc+0x61e kernel`fork_exit+0x11f
> >>> kernel`0xffffffff80cdbfde 2490
> >
> > 0xffffffff80cdbfd0 <fork_trampoline>: mov %r12,%rdi
> > 0xffffffff80cdbfd3 <fork_trampoline+3>: mov %rbx,%rsi
> > 0xffffffff80cdbfd6 <fork_trampoline+6>: mov %rsp,%rdx
> > 0xffffffff80cdbfd9 <fork_trampoline+9>: callq 0xffffffff808db560
> > <fork_exit> 0xffffffff80cdbfde <fork_trampoline+14>: jmpq
> > 0xffffffff80cdca80 <doreti> 0xffffffff80cdbfe3
> > <fork_trampoline+19>: nopw 0x0(%rax,%rax,1)
> > 0xffffffff80cdbfe9 <fork_trampoline+25>: nopl 0x0(%rax)
> >
> >
> >>> I don't have user process created threads nor do fork/exit.
> >>
> >> This has nothing to do with fork/exit but does suggest that you
> >> are running of vnodes. What does sysctl -a | grep vnode say?
> >
> > kern.maxvnodes: 1095872 kern.minvnodes: 273968
> > vm.stats.vm.v_vnodepgsout: 0 vm.stats.vm.v_vnodepgsin: 62399
> > vm.stats.vm.v_vnodeout: 0 vm.stats.vm.v_vnodein: 10680
> > vfs.freevnodes: 275107 vfs.wantfreevnodes: 273968 vfs.numvnodes:
> > 316321 debug.sizeof.vnode: 504
>
> Try setting vfs.wantfreevnodes to 547936 (double it).
Now fork_trampoline was gone, but I still see prcfr (and zfod/totfr
too). Currently half of peeak traffic and I can't check impact to IRQ
handling.
kern.maxvnodes: 1095872
kern.minvnodes: 547936
vm.stats.vm.v_vnodepgsout: 0
vm.stats.vm.v_vnodepgsin: 63134
vm.stats.vm.v_vnodeout: 0
vm.stats.vm.v_vnodein: 10836
vfs.freevnodes: 481873
vfs.wantfreevnodes: 547936
vfs.numvnodes: 517331
debug.sizeof.vnode: 504
Now dtrace -n 'fbt:kernel:vm_object_terminate:entry { @traces[stack()] = count(); }'
kernel`vm_object_deallocate+0x520
kernel`vm_map_entry_deallocate+0x4c
kernel`vm_map_process_deferred+0x3d
kernel`sys_munmap+0x16c
kernel`amd64_syscall+0x5ea
kernel`0xffffffff80cdbd97
56
I think this is nginx memory management (allocation|dealocation). Can
I tune malloc to disable free pages?
More information about the freebsd-current
mailing list