NFS reads vs. writes
Mikhail T.
mi+thun at aldan.algebra.com
Sun Jan 3 18:46:03 UTC 2016
On 03.01.2016 10:32, Tom Curry wrote:
> What does disk activity (gstat or iostat) look like when this is going on?
I use systat for such observations. Here is a typical snapshot of
machine a, when it reads its own /a and writes over NFS to b:/b:
3,6%Sys 0,0%Intr 15,6%User 0,0%Nice 80,8%Idle ozfod
2 ata0 14
| | | | | | | | | | %ozfod
69 uhci0 ehci
==>>>>>>>> daefr
205 uhci1 ahci
13 dtbuf prcfr
2577 hpet0 20
Namei Name-cache Dir-cache 248362 desvn 2374 totfr
1277 em0 256
Calls hits % hits % 95980 numvn
react hdac0 257
1575 1508 96 62091 frevn pdwak
94 hdac1 258
288
pdpgs vgapci0
Disks md0 ada0 ada1 ada2 ada3 da0 da1 intrn
KB/t 0,00 119 26,42 19,10 21,72 0,00 0,00 6720528 wire
tps 0 47 72 64 69 0 0 721552 act
MB/s 0,00 5,42 1,87 1,19 1,45 0,00 0,00 2331396 inact
%busy 0 4 19 11 11 0 0 88 cache
403384 free
1056992 buf
The ada0 is the ssd hosting both read cache and zil devices, ada{1,2,3}
are the three disks comprising a RAID5 zpool.
Meanwhile on the b-side the following is going on:
4,2%Sys 0,0%Intr 0,0%User 0,0%Nice 95,8%Idle
ozfod hdac0 18
| | | | | | | | | |
%ozfod fwohci0 19
== daefr
429 hpet0 uhci
22 dtbuf prcfr
50 uhci0 uhci
Namei Name-cache Dir-cache 282383 desvn totfr
598 atapci1 23
Calls hits % hits % 107825 numvn react
141 mpt0 257
18 17 94 70416 frevn pdwak
1025 bce1 258
50 pdpgs
Disks md0 ada0 da0 da1 da2 da3 da4 intrn
KB/t 0,00 6,50 0,00 80,21 16,00 79,59 68,42 4794972 wire
tps 0 594 0 53 2 55 39 130060 act
MB/s 0,00 3,77 0,00 4,18 0,03 4,29 2,63 7153984 inact
%busy 0 95 0 10 1 14 8 131100 cache
Here too the ada0 hosts the log-device and appears to be the bottleneck.
There is no read-cache on b, and the zpool consists of da1, da3, and da4
simply striped together (no redundancy).
When, instead of /pushing/ data out of a, I begin /pulling/ it (a
different file from the same directory) from b, things change
drastically. a looks like this:
Disks md0 ada0 ada1 ada2 ada3 da0 da1 intrn
KB/t 0,00 83,00 64,00 64,00 64,00 0,00 0,00 6547524 wire
tps 0 27 469 456 472 0 0 744768 act
MB/s 0,00 2,16 29,32 28,49 29,50 0,00 0,00 2722100 inact
%busy 0 1 13 13 13 0 0 108 cache
and b like this:
Disks md0 ada0 da0 da1 da2 da3 da4 intrn
KB/t 0,00 15,46 0,00 114 0,00 116 112 4627944 wire
tps 0 45 0 189 0 192 160 130376 act
MB/s 0,00 0,68 0,00 20,98 0,00 21,74 17,45 7308284 inact
%busy 0 81 0 19 0 37 28 145200 cache
ada0 is no longer the bottleneck and the copy is over almost instantly.
> What is the average latency between the two machines?
ping-ing b from a:
round-trip min/avg/max/stddev = 0.137/0.156/0.178/0.015 ms
ping-ing a from b:
round-trip min/avg/max/stddev = 0.114/0.169/0.220/0.036 ms
On 03.01.2016 11:09, Bob Friesenhahn wrote:
> The most likely issue is a latency problem with synchronous writes on
> 'b'. The main pool disks seem to be working ok. Make sure that the
> SSD you are using for slog is working fine. Maybe it is abnormally slow.
Why would the same ZFS -- with the same slog -- be working faster, when
written to locally, than when over NFS?
-mi
More information about the freebsd-fs
mailing list