Zfs heavy io writing | zfskern txg_thread_enter

Fri Feb 19 11:37:35 UTC 2016

Please try 10.3-BETA2, FreeBSD 9.1 is so old its impossible to comment 
on its performance.

On 19/02/2016 11:07, Niccolò Corvini wrote:
> Hi, first time here!
> We are having a problem with a server running FreeBsd 9.1 with ZFS on a
> single sata drive. Since a few days ago, in the morning the system becomes
> really slow due of a really heavy io writing. We investigated and we think
> it might start at night, maybe correlated to to crondaily (standard) but we
> are not sure.  After a few hours the situation returns to normal.
> Any help is much appreciated
> The machine is a Intel Xeon E5-2620 with 36GB of RAM, the HDD is a 2TB an
> is half full.
> gstat output:
>
>   L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>     13    135     21    641  256.7    108   6410   41.4  128.8| ada0
>      0      0      0      0    0.0      0      0    0.0    0.0| ada0p1
>     13    135     21    641  256.7    108   6410   41.7  128.8| ada0p2
>      0      0      0      0    0.0      0      0    0.0    0.0| cd0
>      0      0      0      0    0.0      0      0    0.0    0.0|
> gptid/3c0de011-4f37-11e5-8217-3085a91c3292
>      0      0      0      0    0.0      0      0    0.0    0.0|
> zvol/zroot/swap
>     13    135     21    641  256.7    108   6410   41.7  128.9| gpt/disk1
>
> Using top -m io shows that the responsible is [zfskern{txg_thread_enter}]
> top -m io output:
>
>   PID JID USERNAME   VCSW  IVCSW   READ  WRITE  FAULT  TOTAL PERCENT COMMAND
>      3   0 root         14      1      0     37      0     37  30.33%
> [zfskern{txg_thread_enter}]
> 49866 215   7070       26      2      0      5      0      5   4.10%
> postgres: stats collector process    (postgres)
> 99901   5     70       42      0      0      4      0      4   3.28%
> postgres: promeditec promeditec.osr.test 192.168.0.246(278
> 24820 199 www          10      0      7      0      0      7   5.74%
> [jsvc{jsvc}]
> 33869 212     88       19      2      0      2      0      2   1.64%
> [mysqld{mysqld}]
> 93400   0 root         13      0     10      0      0     10   8.20% [find]
> 89407 215   7070       10      0      0      1      0      1   0.82%
> postgres: alfresco alfconservazione.dotcom.ts.it 192.168.0
> 15776   5     70       11      0      0      4      0      4   3.28%
> postgres: stats collector process    (postgres)
> 33869 212     88       10      0      0      3      0      3   2.46%
> [mysqld{mysqld}]
> 33869 212     88        2      0      0     11      0     11   9.02%
> [mysqld{mysqld}]
> 18685 198 root          5      0      0      2      0      2   1.64%
> /usr/sbin/syslogd -s
> 15852 214     70        4      1      0      1      0      1   0.82%
> postgres: alfresco alfcomunets.dotcom.ts.it 192.168.0.212(
> 98335 120 root         11      0     29      0      0     29  23.77% find
> /var/log -name messages.* -mtime -2
> 16128 214     70        8      0      0      1      0      1   0.82%
> postgres: alfresco alfaxErre8 192.168.0.208(50558)  (postg
>   1116 198 root         10      0      0      1      0      1   0.82%
> sendmail: ./u1J9k90d001112 local: client DATA status (send
>   1120 198 root          7      0      0      4      0      4   3.28%
> mail.local -l
>
> Using procstat -kk on the zfskern pid shows:
>
>   PID    TID COMM             TDNAME           KSTACK
>      3 100129 zfskern          arc_reclaim_thre mi_switch sleepq_timedwait
> _cv_timedwait arc_reclaim_thread fork_exit fork_trampoline
>      3 100130 zfskern          l2arc_feed_threa mi_switch sleepq_timedwait
> _cv_timedwait l2arc_feed_thread fork_exit fork_trampoline
>      3 100504 zfskern          txg_thread_enter mi_switch sleepq_wait
> _cv_wait txg_thread_wait txg_quiesce_thread fork_exit fork_trampoline
>      3 100505 zfskern          txg_thread_enter mi_switch sleepq_wait
> _cv_wait zio_wait dsl_pool_sync spa_sync txg_sync_thread fork_exit
> fork_trampoline
>      3 100506 zfskern          zvol zroot/swap  mi_switch sleepq_wait _sleep
> zvol_geom_worker fork_exit fork_trampoline
>
> systat -vmstat
>
>      7 users    Load  0.50  0.62  1.46                  Feb 19 11:46
>
> Mem:KB    REAL            VIRTUAL                       VN PAGER   SWAP
> PAGER
>          Tot   Share      Tot    Share    Free           in   out     in
> out
> Act  41318k  199492 1251183k   588716 2254748  count
> All  46844k  229048   -1968M   892088          pages
> Proc:                                                            Interrupts
>    r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        cow    2836 total
>               4k      8516  393  12k  236 1445  11k  11281 zfod
> atkbd0 1
>                                                            ozfod       acpi0
> 9
>   4.4%Sys   0.0%Intr  0.3%User  0.0%Nice 95.4%Idle        %ozfod       ehci0
> 17
> |    |    |    |    |    |    |    |    |    |    |       daefr       ehci1
> 23
> ==                                                  11171 prcfr    79
> cpu0:timer
>                                             dtbuf    11265 totfr       isci0
> 264
> Namei     Name-cache   Dir-cache   1095774 desvn          react    24
> em0:rx 0
>     Calls    hits   %    hits   %    409282 numvn          pdwak    16
> em0:tx 0
>        78      41  53                273943 frevn          pdpgs
> em0:link
>                                                            intrn   196 ahci0
> 278
> Disks  ada0   cd0 pass0 pass1                    20445132 wire    267
> cpu21:time
> KB/t   2.55  0.00  0.00  0.00                    37317552 act      55
> cpu13:time
> tps     223     0     0     0                     4948708 inact    86
> cpu5:timer
> MB/s   0.56  0.00  0.00  0.00                      884804 cache    24
> cpu12:time
> %busy    94     0     0     0                     1370012 free     63
> cpu10:time
>                                                            buf     306
> cpu19:time
>                                                                     63
> cpu11:time
>                                                                     55
> cpu14:time
>                                                                     86
> cpu9:timer
>                                                                     71
> cpu18:time
>                                                                     86
> cpu3:timer
>                                                                     47
> cpu23:time
>                                                                     55
> cpu6:timer
>                                                                     55
> cpu22:time
>                                                                     71
> cpu2:timer
>                                                                     39
> cpu20:time
>
> zpool status:
>    pool: zroot
>   state: ONLINE
>    scan: scrub repaired 0 in 3h46m with 0 errors on Wed Nov  4 21:54:44 2015
> config:
>
>          NAME         STATE     READ WRITE CKSUM
>          zroot        ONLINE       0     0     0
>            gpt/disk1  ONLINE       0     0     0
>
> errors: No known data errors
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"