slowdown of zfs (tx->tx)
Artem Belevich
art at freebsd.org
Thu Jan 10 01:15:10 UTC 2013
On Wed, Jan 9, 2013 at 8:26 AM, Nicolas Rachinsky
<fbsd-mas-0 at ml.turing-complete.org> wrote:
> * Artem Belevich <art at freebsd.org> [2013-01-08 12:47 -0800]:
>> On Tue, Jan 8, 2013 at 9:42 AM, Nicolas Rachinsky
>> <fbsd-mas-0 at ml.turing-complete.org> wrote:
>> > NAME STATE READ WRITE CKSUM
>> > pool1 DEGRADED 0 0 0
>> > raidz2-0 DEGRADED 0 0 0
>> > ada5 ONLINE 0 0 0
>> > ada8 ONLINE 0 0 0
>> > ada2 ONLINE 0 0 0
>> > ada3 ONLINE 0 0 0
>> > 11846390416703086268 UNAVAIL 0 0 0 was /dev/dsk/ada1
>> > ada6 ONLINE 0 0 0
>> > ada0 ONLINE 0 0 1
>> > ada7 ONLINE 0 0 0
>> > ada4 ONLINE 0 0 3
>>
>> You seem to have some checksum errors which does suggest hardware troubles.
>
> I somehow missed these. Is there any way to learn when these checksum
> errors happen?
Not on FreeBSD (yet) as far as I can tell. Not explicitly, anyways.
Check /var/log/messages for any indications of SATA errors. There's a
good chance that there was a timeout at some point.
>> For starters, check smart info for all drives and see if they have any
>> relocated sectors.
>
> There are some disks with relocated sectors, but for both ada0 and
> ada4 Reallocated_Sector_Ct is 0.
Are there any UDMA errors? Those would suggest trouble with cabling.
>> Use gstat during your workload to see if any of the drives takes much
>> longer than others to handle its job.
>
> There is one disk sticking out a bit.
In a raid-z pool number of transactions/second is determined by the
slowest disk. Check ms/w column. Look for numbers substantially higher
than typical seek rate (10..20ms is OK, 100 is not).
>
>> > There is almost no disk activity during this time.
>>
>> What kind of disk activity *is* there?
>
> What would be interesting?
Drives 'sticking out' being busy longer than their peers in the pool.
Excessive ms/r or ms/w in gstat. Unexpected reads or writes.
--Artem
More information about the freebsd-fs
mailing list