slowdown of zfs (tx->tx)

Tue Jan 8 20:47:40 UTC 2013

On Tue, Jan 8, 2013 at 9:42 AM, Nicolas Rachinsky
<fbsd-mas-0 at ml.turing-complete.org> wrote:
>       NAME                      STATE     READ WRITE CKSUM
>         pool1                     DEGRADED     0     0     0
>           raidz2-0                DEGRADED     0     0     0
>             ada5                  ONLINE       0     0     0
>             ada8                  ONLINE       0     0     0
>             ada2                  ONLINE       0     0     0
>             ada3                  ONLINE       0     0     0
>             11846390416703086268  UNAVAIL      0     0     0  was /dev/dsk/ada1
>             ada6                  ONLINE       0     0     0
>             ada0                  ONLINE       0     0     1
>             ada7                  ONLINE       0     0     0
>             ada4                  ONLINE       0     0     3

You seem to have some checksum errors which does suggest hardware troubles.

For starters, check smart info for all drives and see if they have any
relocated sectors.
Use gstat during your workload to see if any of the drives takes much
longer than others to handle its job.

> There is almost no disk activity during this time.

What kind of disk activity *is* there? Sleeping on 'tx->tx...' usually
means that ZFS is trying to commit data to disk. Normally it happens
once every few seconds (10 is default if I remember correctly). It may
happen more often if you do a lot of synchronous writes. I believe
there was an iostat-like dtrace script that would show synchronous
write rate, but I can't seem to find it.

> sync is disabled for the whole pool.

If that's the case (assyming you're talking about sync=disabled zfs
property), then synchronous writes are probably not the cause of
slowdown. My guess would be either failing HDD or something funky with
cabling or sata controller.

--Artem