FreeBSD 10.0-BETA1 #8 r256765M spend too much time in locks

Vitalij Satanivskij satan at ukr.net
Thu Oct 24 16:52:21 UTC 2013


It's me again :) 

Some little more information 

Using simple script like - 
#!/bin/sh

time=`date "+%Y-%m-%d_%H:%M"`
sysctl debug.lock.prof.reset=1
sysctl debug.lock.prof.enable=1
sleep 1
sysctl debug.lock.prof.enable=0
sysctl debug.lock.prof.stats > locks_stats_$time.log

and run it exactly when zfs flush data to pool 

eg. 

disk1       7,53T  8,78T    114     24  2,38M  1,36M
disk1       7,53T  8,78T     16  4,18K   307K  60,9M
disk1       7,53T  8,78T    146  7,18K  2,73M  48,2M
disk1       7,53T  8,78T    372     60  7,71M  2,99M
disk1       7,53T  8,78T    119     18  4,36M   940K
disk1       7,53T  8,78T    115     33  2,56M  1,53M
disk1       7,53T  8,78T     35    942   929K  11,1M
disk1       7,53T  8,78T     58  5,64K   971K  89,4M - here 
disk1       7,53T  8,78T    271  2,63K  8,10M  10,9M
disk1       7,53T  8,78T    218     21  11,7M  1,16M
disk1       7,53T  8,78T     86     15  2,66M   786K
disk1       7,53T  8,78T     92     15  2,29M   841K
disk1       7,53T  8,78T     68     27  1,80M  5,98M


get more information 

first  system part of load grows up to 60-70% even when overal load not big 

second using script 

#!/bin/sh

if [ -z "$1" ]
then
    echo "Usage: lock.sh file.txt"
    exit 1
fi

count=`awk 'BEGIN {NR>2}; {sum += $4}; END{print sum}' $1`
awk -v col=$count 'BEGIN {NR>2}; {foo = ($4/col)* 100; if (foo >= 0.01) { printf "%f\t %s", foo, $10; for(i = 11;i <= NF; ++i) printf "%s ", $i; printf "\n" }}' $1 | sort -nr > res_$1.txt

found next top's 

26,812861        /usr/src/sys/kern/subr_vmem.c:1128(sleep mutex:kmem arena)
21,292701        /usr/src/sys/kern/subr_vmem.c:1230(sleep mutex:kmem arena)
13,661495        /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:3541(sx:l2arc_buflist_mtx)
13,099382        /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1597(sx:buf_hash_table.ht_locks[i].ht_lock)
4,028993         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:2009(sx:arc_eviction_mtx)
2,972297         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1969(sx:arc_mru->arcs_locks[i].arcs_lock)
2,558227         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:862(sx:buf_hash_table.ht_locks[i].ht_lock)
2,328091         /usr/src/sys/kern/vfs_subr.c:2101(lockmgr:zfs)
2,294326         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:396(sx:vq->vq_lock)
1,418128         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1256(sx:arc_mfu->arcs_locks[i].arcs_lock)
1,407655         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1256(sx:arc_mru->arcs_locks[i].arcs_lock)
1,176428         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c:41(sx:os->os_obj_lock)
0,936418         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1969(sx:arc_mfu->arcs_locks[i].arcs_lock)
0,848378         /usr/src/sys/vm/vm_kern.c:400(rw:kmem vm object)
0,845051         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1381(sx:zfsvfs->z_hold_mtx[i])
0,808122         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:129(sx:db->db_mtx)
0,680536         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1136(sx:zfsvfs->z_hold_mtx[i])
0,616060         /usr/src/sys/vm/vm_kern.c:344(rw:kmem vm object)
0,609296         /usr/src/sys/kern/vfs_lookup.c:707(lockmgr:zfs)


Another run

36,917638        /usr/src/sys/kern/subr_vmem.c:1230(sleep mutex:kmem arena)
32,872199        /usr/src/sys/kern/subr_vmem.c:1128(sleep mutex:kmem arena)
5,242466         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:3541(sx:l2arc_buflist_mtx)
4,885420         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:891(sx:buf_hash_table.ht_locks[i].ht_lock)
4,626170         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1597(sx:buf_hash_table.ht_locks[i].ht_lock)
2,125377         /usr/src/sys/kern/vfs_lookup.c:707(lockmgr:zfs)
1,831535         /usr/src/sys/vm/vm_kern.c:400(rw:kmem vm object)
1,202869         /usr/src/sys/vm/vm_kern.c:344(rw:kmem vm object)
1,187653         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:2009(sx:arc_eviction_mtx)
1,183960         /usr/src/sys/vm/uma_core.c:2097(sleep mutex:zio_cache)
1,015616         /usr/src/sys/vm/uma_core.c:2605(sleep mutex:zio_cache)
0,679882         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1136(sx:zfsvfs->z_hold_mtx[i])
0,577229         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:129(sx:db->db_mtx)
0,555283         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1256(sx:arc_mfu->arcs_locks[i].arcs_lock)
0,545886         /usr/src/sys/vm/uma_core.c:2259(sleep mutex:zio_cache)
0,510988         /usr/src/sys/vm/uma_core.c:2605(sleep mutex:zio_link_cache)
0,503242         /usr/src/sys/vm/uma_core.c:2710(sleep mutex:zio_cache)


Buzy on drives not grows over 20% 

Maybe somebody have any sugestion about what can be going on or at last what to check/tune ?


Vitalij Satanivskij wrote:
VS> Forget about profiling log 
VS> 
VS> 
VS> Vitalij Satanivskij wrote:
VS> VS> 
VS> VS> 
VS> VS> EEE  forget to notice - old system have none compression on main zfs 
VS> VS> 
VS> VS> 
VS> VS> 
VS> VS> Vitalij Satanivskij wrote:
VS> VS> VS> Hello.
VS> VS> VS> 
VS> VS> VS> After upgrading system from old current (r245701) to fresh current r255173 (than switch to stable/10 r256765M) 
VS> VS> VS> found some strange system behavior:
VS> VS> VS> 
VS> VS> VS> Diff  from r256765M anf r256765 is 
VS> VS> VS> Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
VS> VS> VS> ===================================================================
VS> VS> VS> --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c        (revision 256765)
VS> VS> VS> +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c        (working copy)
VS> VS> VS> @@ -5147,7 +5147,7 @@
VS> VS> VS>         len = l2hdr->b_asize;
VS> VS> VS>         cdata = zio_data_buf_alloc(len);
VS> VS> VS>         csize = zio_compress_data(ZIO_COMPRESS_LZ4, l2hdr->b_tmp_cdata,
VS> VS> VS> -           cdata, l2hdr->b_asize, (size_t)SPA_MINBLOCKSIZE);
VS> VS> VS> +           cdata, l2hdr->b_asize, (size_t)(1ULL << l2hdr->b_dev->l2ad_vdev->vdev_ashift));
VS> VS> VS>  
VS> VS> VS>         if (csize == 0) {
VS> VS> VS>                 /* zero block, indicate that there's nothing to write */
VS> VS> VS> 
VS> VS> VS> 
VS> VS> VS> But same situation was befor patch.
VS> VS> VS> 
VS> VS> VS> 
VS> VS> VS> System load to high 
VS> VS> VS> CPU:  6.8% user,  0.0% nice, 57.3% system,  0.8% interrupt, 35.1% idle
VS> VS> VS> 
VS> VS> VS> hotkernel (dtrace script) says 
VS> VS> VS> 
VS> VS> VS> kernel`__mtx_unlock_flags                                 292   0.3%
VS> VS> VS> kernel`__mtx_lock_flags                                   315   0.3%
VS> VS> VS> zfs.ko`lzjb_compress                                      349   0.3%
VS> VS> VS> kernel`__rw_wlock_hard                                    686   0.7%
VS> VS> VS> kernel`spinlock_exit                                     1050   1.0%
VS> VS> VS> kernel`vmem_xalloc                                       7516   7.3%
VS> VS> VS> kernel`_sx_xlock_hard                                    8664   8.5%
VS> VS> VS> kernel`acpi_cpu_c1                                       9737   9.5%
VS> VS> VS> kernel`cpu_idle                                         20189  19.7%
VS> VS> VS> kernel`__mtx_lock_sleep                                 45952  44.9%
VS> VS> VS> 
VS> VS> VS> 
VS> VS> VS> 
VS> VS> VS> Trying to  understand where is a problem I'm build kernel with lock profiling. 
VS> VS> VS> 
VS> VS> VS> and get some data (for one minute )
VS> VS> VS> 
VS> VS> VS> (file attached) 
VS> VS> VS> 
VS> VS> VS> using agregation find most lock's is in 
VS> VS> VS> 
VS> VS> VS> 14,159818        /usr/src/sys/kern/subr_vmem.c:1128(sleep mutex:kmem arena) 
VS> VS> VS> 9,553523         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1597(sx:buf_hash_table.ht_locks[i].ht_lock) 
VS> VS> VS> 8,386943         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:3541(sx:l2arc_buflist_mtx) 
VS> VS> VS> 8,110206         /usr/src/sys/kern/subr_vmem.c:1230(sleep mutex:kmem arena) 
VS> VS> VS> 5,909429         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1969(sx:arc_mru->arcs_locks[i].arcs_lock) 
VS> VS> VS> 5,452206         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1969(sx:arc_mfu->arcs_locks[i].arcs_lock) 
VS> VS> VS> 5,050224         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:303(sx:tx->tx_cpu[c].tc_open_lock) 
VS> VS> VS> 4,232819         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:891(sx:buf_hash_table.ht_locks[i].ht_lock) 
VS> VS> VS> 4,211348         /usr/src/sys/kern/vfs_subr.c:2101(lockmgr:zfs) 
VS> VS> VS> 4,011656         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:862(sx:buf_hash_table.ht_locks[i].ht_lock) 
VS> VS> VS> 3,823698         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:2009(sx:arc_eviction_mtx) 
VS> VS> VS> 2,697344         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:126(sx:h->hash_mutexes[i]) 
VS> VS> VS> 2,343242         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1256(sx:arc_mfu->arcs_locks[i].arcs_lock) 
VS> VS> VS> 1,752713         /usr/src/sys/kern/vfs_lookup.c:707(lockmgr:zfs) 
VS> VS> VS> 1,580856         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1136(sx:zfsvfs->z_hold_mtx[i]) 
VS> VS> VS> 1,534242         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1291(sx:arc_mfu->arcs_locks[i].arcs_lock) 
VS> VS> VS> 1,331583         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:129(sx:db->db_mtx) 
VS> VS> VS> 1,105058         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:427(sx:vq->vq_lock) 
VS> VS> VS> 1,080855         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:396(sx:vq->vq_lock) 
VS> VS> VS> 0,858568         /usr/src/sys/kern/vfs_cache.c:488(rw:Name Cache) 
VS> VS> VS> 0,831652         /usr/src/sys/vm/vm_kern.c:344(rw:kmem vm object) 
VS> VS> VS> 0,815439         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1845(sx:buf_hash_table.ht_locks[i].ht_lock) 
VS> VS> VS> 0,812613         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1256(sx:arc_mru->arcs_locks[i].arcs_lock) 
VS> VS> VS> 0,794274         /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1529(lockmgr:zfs) 
VS> VS> VS> 0,669845         /usr/src/sys/vm/uma_core.c:2097(sleep mutex:zio_cache)
VS> VS> VS> 
VS> VS> VS> 
VS> VS> VS> 
VS> VS> VS> 
VS> VS> VS> Short system description 
VS> VS> VS> CPU E5-1650
VS> VS> VS> MEM 128Gb ddr3-1600
VS> VS> VS> 
VS> VS> VS> Storage subsystem 
VS> VS> VS> 
VS> VS> VS> 36x1Tb WD RE4 drives on LSI SAS2308 Controler
VS> VS> VS> 3x180Gb Intel ssd 530 series as l2 cache 
VS> VS> VS> 
VS> VS> VS> 
VS> VS> VS> POOL is  18 mirrors, two drives in ich mirror  and 3 cache devices 
VS> VS> VS> 
VS> VS> VS> eg. 
VS> VS> VS> ....
VS> VS> VS>           mirror-14     ONLINE       0     0     0
VS> VS> VS>             gpt/disk28  ONLINE       0     0     0
VS> VS> VS>             gpt/disk29  ONLINE       0     0     0
VS> VS> VS>           mirror-15     ONLINE       0     0     0
VS> VS> VS>             gpt/disk30  ONLINE       0     0     0
VS> VS> VS>             gpt/disk31  ONLINE       0     0     0
VS> VS> VS>           mirror-16     ONLINE       0     0     0
VS> VS> VS>             gpt/disk32  ONLINE       0     0     0
VS> VS> VS>             gpt/disk33  ONLINE       0     0     0
VS> VS> VS>           mirror-17     ONLINE       0     0     0
VS> VS> VS>             gpt/disk34  ONLINE       0     0     0
VS> VS> VS>             gpt/disk35  ONLINE       0     0     0
VS> VS> VS>         cache
VS> VS> VS>           ada1          ONLINE       0     0     0
VS> VS> VS>           ada2          ONLINE       0     0     0
VS> VS> VS>           ada3          ONLINE       0     0     0
VS> VS> VS> 
VS> VS> VS> 
VS> VS> VS> POOL have two ZFS 
VS> VS> VS> 
VS> VS> VS> main with options (diffs from default) - 
VS> VS> VS> compression           lz4
VS> VS> VS> secondarycache        all
VS> VS> VS> sync                  disabled
VS> VS> VS> 
VS> VS> VS> Data size for it around 6Tb (compresed) eg 
VS> VS> VS> disk1  refcompressratio      1.56x                                          -
VS> VS> VS> disk1  written               5,99T                                          -
VS> VS> VS> disk1  logicalused           10,8T                                          -
VS> VS> VS> disk1  logicalreferenced     9,32T                                          -
VS> VS> VS> 
VS> VS> VS> 
VS> VS> VS> and another  with options 
VS> VS> VS> recordsize            4K, before it was 32k, but internal software use data blocks mostly 4k so we try to change (without real data realocate ) 
VS> VS> VS> compresion -s off 
VS> VS> VS> sync                  disabled
VS> VS> VS> secondarycache        all
VS> VS> VS> 
VS> VS> VS> DATA size on it around 1.4Tb
VS> VS> VS> 
VS> VS> VS> ARC configured to use max 80Gb 
VS> VS> VS> 
VS> VS> VS> top most time looks like this 
VS> VS> VS> 
VS> VS> VS> Mem: 14G Active, 13G Inact, 94G Wired, 497M Cache, 3297M Buf, 2214M Free
VS> VS> VS> ARC: 80G Total, 2010M MFU, 70G MRU, 49M Anon, 3822M Header, 4288M Other
VS> VS> VS> 
VS> VS> VS> 
VS> VS> VS> LA's - around 10-20 depend on day time.
VS> VS> VS> 
VS> VS> VS> 
VS> VS> VS> zpool iostat disk1 1
VS> VS> VS>                capacity     operations    bandwidth
VS> VS> VS> pool        alloc   free   read  write   read  write
VS> VS> VS> ----------  -----  -----  -----  -----  -----  -----
VS> VS> VS> disk1       7,45T  8,86T    546  1,49K  16,4M  13,4M
VS> VS> VS> disk1       7,45T  8,86T    272  3,91K  11,7M  27,4M
VS> VS> VS> disk1       7,45T  8,86T    344  2,94K  7,26M  25,2M
VS> VS> VS> disk1       7,45T  8,86T    224  2,02K  9,91M  21,8M
VS> VS> VS> disk1       7,45T  8,86T    244  2,35K  8,23M  18,4M
VS> VS> VS> disk1       7,45T  8,86T    245  2,54K  6,44M  23,4M
VS> VS> VS> disk1       7,45T  8,86T    114  2,94K  3,28M  13,3M
VS> VS> VS> disk1       7,45T  8,86T    288  4,43K  6,09M  33,5M
VS> VS> VS> disk1       7,45T  8,86T    157  1,26K  2,98M  15,7M
VS> VS> VS> disk1       7,45T  8,86T     94    842  3,07M  13,6M
VS> VS> VS> disk1       7,45T  8,86T    327  1,71K  9,63M  8,21M
VS> VS> VS> disk1       7,45T  8,86T    237  1,81K  5,73M  29,3M
VS> VS> VS> disk1       7,45T  8,86T    247  3,47K  5,17M  29,6M
VS> VS> VS> disk1       7,45T  8,86T    165  2,37K  3,22M  16,7M
VS> VS> VS> disk1       7,45T  8,86T    155  3,23K  3,27M  23,9M
VS> VS> VS> 
VS> VS> VS> Strange as timeout seted up to 10sec's.
VS> VS> VS> 
VS> VS> VS> What intresting - after reboot system work fine for some time, at last while ARC not become  80G size. 
VS> VS> VS> Low limit for arc is 40gb, strange but old system can take memory from arc eg like this 
VS> VS> VS> 
VS> VS> VS> 
VS> VS> VS> Mem: 32G Active, 8797M Inact, 78G Wired, 2370M Cache, 209M Buf, 3977M Free
VS> VS> VS> ARC: 43G Total, 2204M MFU, 28G MRU, 135M Anon, 7898M Header, 5301M Other
VS> VS> VS> 
VS> VS> VS> On new ARC getting to it max allowed size. 
VS> VS> VS> 
VS> VS> VS> So for now question is, what it can be, what we can try to improve system perfomance and so on?
VS> VS> VS> 
VS> VS> VS> 
VS> VS> VS> 
VS> VS> VS> _______________________________________________
VS> VS> VS> freebsd-hackers at freebsd.org mailing list
VS> VS> VS> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
VS> VS> VS> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
VS> VS> _______________________________________________
VS> VS> freebsd-hackers at freebsd.org mailing list
VS> VS> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
VS> VS> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"


VS> _______________________________________________
VS> freebsd-hackers at freebsd.org mailing list
VS> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
VS> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"



More information about the freebsd-hackers mailing list