ZFS txg implementation flaw

Slawa Olhovchenkov slw at zxy.spb.ru
Mon Oct 28 22:24:54 UTC 2013


On Mon, Oct 28, 2013 at 02:56:17PM -0700, Xin Li wrote:

> >>> Semi-indirect. dtrace -n 'fbt:kernel:vm_object_terminate:entry
> >>> { @traces[stack()] = count(); }'
> >>> 
> >>> After some (2-3) seconds
> >>> 
> >>> kernel`vnode_destroy_vobject+0xb9
> >>> zfs.ko`zfs_freebsd_reclaim+0x2e kernel`VOP_RECLAIM_APV+0x78
> >>> kernel`vgonel+0x134 kernel`vnlru_free+0x362
> >>> kernel`vnlru_proc+0x61e kernel`fork_exit+0x11f
> >>> kernel`0xffffffff80cdbfde 2490
> > 
> > 0xffffffff80cdbfd0 <fork_trampoline>:   mov    %r12,%rdi 
> > 0xffffffff80cdbfd3 <fork_trampoline+3>: mov    %rbx,%rsi 
> > 0xffffffff80cdbfd6 <fork_trampoline+6>: mov    %rsp,%rdx 
> > 0xffffffff80cdbfd9 <fork_trampoline+9>: callq  0xffffffff808db560
> > <fork_exit> 0xffffffff80cdbfde <fork_trampoline+14>:        jmpq
> > 0xffffffff80cdca80 <doreti> 0xffffffff80cdbfe3
> > <fork_trampoline+19>:        nopw 0x0(%rax,%rax,1) 
> > 0xffffffff80cdbfe9 <fork_trampoline+25>:        nopl   0x0(%rax)
> > 
> > 
> >>> I don't have user process created threads nor do fork/exit.
> >> 
> >> This has nothing to do with fork/exit but does suggest that you
> >> are running of vnodes.  What does sysctl -a | grep vnode say?
> > 
> > kern.maxvnodes: 1095872 kern.minvnodes: 273968 
> > vm.stats.vm.v_vnodepgsout: 0 vm.stats.vm.v_vnodepgsin: 62399 
> > vm.stats.vm.v_vnodeout: 0 vm.stats.vm.v_vnodein: 10680 
> > vfs.freevnodes: 275107 vfs.wantfreevnodes: 273968 vfs.numvnodes:
> > 316321 debug.sizeof.vnode: 504
> 
> Try setting vfs.wantfreevnodes to 547936 (double it).

Now fork_trampoline was gone, but I still see prcfr (and zfod/totfr
too). Currently half of peeak traffic and I can't check impact to IRQ
handling.

kern.maxvnodes: 1095872
kern.minvnodes: 547936
vm.stats.vm.v_vnodepgsout: 0
vm.stats.vm.v_vnodepgsin: 63134
vm.stats.vm.v_vnodeout: 0
vm.stats.vm.v_vnodein: 10836
vfs.freevnodes: 481873
vfs.wantfreevnodes: 547936
vfs.numvnodes: 517331
debug.sizeof.vnode: 504

Now dtrace -n 'fbt:kernel:vm_object_terminate:entry { @traces[stack()] = count(); }'

              kernel`vm_object_deallocate+0x520
              kernel`vm_map_entry_deallocate+0x4c
              kernel`vm_map_process_deferred+0x3d
              kernel`sys_munmap+0x16c
              kernel`amd64_syscall+0x5ea
              kernel`0xffffffff80cdbd97
               56

I think this is nginx memory management (allocation|dealocation). Can
I tune malloc to disable free pages?


More information about the freebsd-current mailing list