NFS FHA issue and possible change to the algorithm
Rick Macklem
rmacklem at uoguelph.ca
Fri Oct 23 12:04:40 UTC 2015
Hi,
An off list discussion occurred where a site running an NFS server found that
they needed to disable File Handle Affinity (FHA) to get good performance.
Here is a re-post of some of that (with Josh's permission):
First what was observed w.r.t. the machine.
Josh Paetzel wrote:
>>>> It's all good.
>>>>
>>>> It's a 96GB RAM machine and I have 2 million nmbclusters, so 8GB RAM,
>>>> and we've tried 1024 NFS threads.
>>>>
>>>> It might be running out of network memory but we can't really afford to
>>>> give it any more, for this use case disabling FHA might end up being the
>>>> way to go.
>>>>
I wrote:
>>> Just to fill mav@ in, the person that reported a serious performance
>>> problem
>>> to Josh was able to fix it by disabling FHA.
Josh Paetzel wrote:
>>
>> There's about 300 virtual machines that mount root from a read only NFS
>> share.
>>
>> There's also another few hundred users that mount their home directories
>> over NFS. When things went sideways it is always the virtual machines
>> that get unusable. 45 seconds to log in via ssh, 15 minutes to boot,
>> stuff like that.
>>
>> root at head2] ~# nfsstat -s 1
>> GtAttr Lookup Rdlink Read Write Rename Access Rddir
>> 4117 17 0 124 689 4 680 0
>> 4750 31 5 121 815 3 950 1
>> 4168 16 0 109 659 9 672 0
>> 4416 24 0 112 771 3 748 0
>> 5038 86 0 76 728 4 825 0
>> 5602 21 0 76 740 3 702 6
>>
>> [root at head2] ~# arcstat.py 1
>> time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
>> 18:25:36 21 0 0 0 0 0 0 0 0 65G 65G
>> 18:25:37 1.8K 23 1 23 1 0 0 7 0 65G 65G
>> 18:25:38 1.9K 88 4 32 1 56 32 3 0 65G 65G
>> 18:25:39 2.2K 67 3 62 2 5 5 2 0 65G 65G
>> 18:25:40 2.7K 132 4 39 1 93 17 8 0 65G 65G
>>
>> last pid: 7800; load averages: 1.44, 1.65, 1.68
>> up
>> 0+19:22:29 18:26:16
>> 69 processes: 1 running, 68 sleeping
>> CPU: 0.1% user, 0.0% nice, 1.8% system, 0.9% interrupt, 97.3% idle
>> Mem: 297M Active, 180M Inact, 74G Wired, 140K Cache, 565M Buf, 19G Free
>> ARC: 66G Total, 39G MFU, 24G MRU, 53M Anon, 448M Header, 1951M Other
>> Swap: 28G Total, 28G Free
>>
>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
>> COMMAND
>> 9915 root 37 52 0 9900K 2060K rpcsvc 16 16.7H 24.02% nfsd
>> 6402 root 1 52 0 85352K 20696K select 8 47:17 3.08%
>> python2.7
>> 43178 root 1 20 0 70524K 30752K select 7 31:04 0.59%
>> rsync
>> 7363 root 1 20 0 49512K 6456K CPU16 16 0:00 0.59% top
>> 37968 root 1 20 0 70524K 31432K select 7 16:53 0.00%
>> rsync
>> 37969 root 1 20 0 55752K 11052K select 1 9:11 0.00% ssh
>> 13516 root 12 20 0 176M 41152K uwait 23 4:14 0.00%
>> collectd
>> 31375 root 12 20 0 176M 42432K uwa
>>
>> This is a quick peek at the system at the end of the day, so load has
>> dropped off considerably, however the main takeaway is it has plenty of
>> free RAM, and ZFS ARC hit percentage is > 99%.
>>
I wrote:
>>> I took a look at it and I wonder if it is time to consider changing the
>>> algorithm
>>> somewhat?
>>>
>>> The main thing that I wonder about is doing FHA for all the RPCs other than
>>> Read and Write.
>>>
>>> In particular, Getattr is often the most frequent RPC and doing FHA for it
>>> seems
>>> like wasted overhead to me? Normally separate Getattr RPCs wouldn't be done
>>> for
>>> FHs are being Read/Written, since the Read/Write reply has updated
>>> attributes in it.
>>>
Although the load is mostly Getattr RPCs and I think the above statement is correct,
I don't know if the overhead of doing FHA for all the Getattr RPCs explains the observed
performance problem?
I don't see how doing FHA for RPCs like Getattr will improve their performance.
Note that when the FHA algorithm was originally done, there wasn't a shared vnode
lock and, as such, all RPCs on a given FH/vnode would have been serialized by the vnode
lock anyhow. Now, with shared vnode locks, this isn't the case for frequently performed
RPCs like Getattr, Read (Write for ZFS), Lookup and Access. I have always felt that
doing FHA for RPCs other than Read and Write didn't make much sense to me, but I don't
have any evidence that it causes a significant performance penalty.
Anyhow, the attached simple patch limits FHA to Read and Write RPCs.
The simple testing I've done shows it to be about performance neutral (0-1% improvement),
but I have only small hardware and no ZFS or any easy way to emulate a load of mostly
Getattr RPCs. As such, unless others can determine if this patch (or some other one)
helps w.r.t. this, I don't think committing it makes much sense?
If anyone can test this or have comments w.r.t. this or suggestions for other possible
changes to the FHA algorithm, please do so.
Thanks, rick
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nfsfha.patch
Type: text/x-patch
Size: 1882 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20151023/2f816d84/attachment.bin>
More information about the freebsd-fs
mailing list