Re: git: 151abc80cde7 - main - if_vlan: avoid hash table thrashing when adding and removing entries

From: Kristof Provost <kp_at_FreeBSD.org>
Date: Tue, 26 Jul 2022 09:03:47 UTC
On 25 Jul 2022, at 18:20, John Baldwin wrote:
> On 7/22/22 4:19 PM, Kristof Provost wrote:
>> The branch main has been updated by kp:
>>
>> URL: 
>> https://cgit.FreeBSD.org/src/commit/?id=151abc80cde778bc18b91c334d07fbd52bbb38fb
>>
>> commit 151abc80cde778bc18b91c334d07fbd52bbb38fb
>> Author:     Kristof Provost <kp@FreeBSD.org>
>> AuthorDate: 2022-07-22 17:17:04 +0000
>> Commit:     Kristof Provost <kp@FreeBSD.org>
>> CommitDate: 2022-07-22 17:18:41 +0000
>>
>>      if_vlan: avoid hash table thrashing when adding and removing 
>> entries
>>          vlan_remhash() uses incorrect value for b.
>>          When using the default value for VLAN_DEF_HWIDTH (4), the 
>> VLAN hash-list table
>>      expands from 16 chains to 32 chains as the 129th entry is added. 
>> trunk->hwidth
>>      becomes 5. Say a few more entries are added and there are now 
>> 135 entries.
>>      trunk-hwidth will still be 5. If an entry is removed, 
>> vlan_remhash() will
>>      calculate a value of 32 for b. refcnt will be decremented to 
>> 134. The if
>>      comparison at line 473 will return true and vlan_growhash() will 
>> be called. The
>>      VLAN hash-list table will be compressed from 32 chains wide to 
>> 16 chains wide.
>>      hwidth will become 4. This is an error, and it can be seen when 
>> a new VLAN is
>>      added. The table will again be expanded. If an entry is then 
>> removed, again
>>      the table is contracted.
>>          If the number of VLANS stays in the range of 128-512, each 
>> time an insert
>>      follows a remove, the table will expand. Each time a remove 
>> follows an
>>      insert, the table will be contracted.
>>          The fix is simple. The line 473 should test that the number 
>> of entries has
>>      decreased such that the table should be contracted using what 
>> would be the new
>>      value of hwidth. line 467 should be:
>>                  b = 1 << (trunk->hwidth - 1);
>>          PR:             265382
>>      Reviewed by:    kp
>>      MFC after:      2 weeks
>>      Sponsored by:   NetApp, Inc.
>
> This does mean that there are some odd edge cases (e.g. if you remove 
> the 513th entry
> only to turn around and re-add it) that can still thrash even with 
> this fixed.  Not
> sure if those are worth worrying about though?  If they were, perhaps 
> you could
> imagine some sort of "dampener" where you have to remove some number N 
> of entries
> to actually rehash to a smaller table.  But that is probably rather 
> complex to
> implement and probably not worth it?
>
Good point, although I think I agree it’s probably not worth worrying 
too much about this.

There’s a tangentially related issue I also want to look at one of 
these days, namely that the current `trunk->refcnt > (b * b) / 2` check 
results in far too many entries in each hash row. For example, until 128 
vlan interfaces we stick with width = 4, so end up with an average of 8 
entries in each row.
That means that for every inbound packet we scan through a linked list 
of 8 entries to find the correct vlan interface. We should tweak the 
check to reduce this, as all it’ll cost is a tiny bit of memory. (Or 
we should just default VLAN_ARRAY on, paying a bit more memory for no 
linked-list scanning).

Kristof