Odd performance problems with many vnet jail on a bridge and (possibly) ipfw
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 05 Jan 2024 12:22:01 UTC
Hi all, we have that single host with the largest number of jails - about 250 active. All jails are bridged to the external interfaces (lagg + vlan) and also have a private bridge not connected to any hardware or network. The jails run a CMS and the database server is centralised. Today the customer called in and told me that he could finally spot a problem they had noticed for quite some time but where never able to really diagnose. RTT across the private bridge - again, not connected to any infrastructure - and with 250 hosts not really an overpopulated broadcast domain is way to high and highly fluctuating: root@vpro0742:~ # ping dbhost PING6(56=40+8+8 bytes) fd31:4159:96::742 --> fd31:4159:96::472 16 bytes from fd31:4159:96::472, icmp_seq=0 hlim=64 time=140.344 ms 16 bytes from fd31:4159:96::472, icmp_seq=1 hlim=64 time=166.879 ms 16 bytes from fd31:4159:96::472, icmp_seq=2 hlim=64 time=158.025 ms 16 bytes from fd31:4159:96::472, icmp_seq=3 hlim=64 time=111.139 ms 16 bytes from fd31:4159:96::472, icmp_seq=4 hlim=64 time=124.974 ms 16 bytes from fd31:4159:96::472, icmp_seq=5 hlim=64 time=137.521 ms ^C --- dbhost ping6 statistics --- 6 packets transmitted, 6 packets received, 0.0% packet loss round-trip min/avg/max/std-dev = 111.139/139.813/166.879/18.762 ms For the external bridge things look similar but due to other factors at play (larger broadcast domain, real infrastructure etc.) I'll concentrate on the private bridge first. When the host system is booted at first the performance is as expected. The more jails are brought online the worse it seems to get. I have found various discussions concerning pf NAT and LRO and similar problems with hardware offloading. Needless to say it's all disabled on the external network interfaces (igb), but the private bridge of course does not even have any of this. Also we do not use pf anywhere but we do use ipfw in every single jail for our inbound SNI proxy connections. So for a quick test I changed these rules to firewall_type="open" in rc.conf in two of the jails but that seemed not to change much although the numbers seem to be slightly lower: root@vpro0742:~ # ping dbhost PING6(56=40+8+8 bytes) fd31:4159:96::742 --> fd31:4159:96::472 16 bytes from fd31:4159:96::472, icmp_seq=0 hlim=64 time=67.275 ms 16 bytes from fd31:4159:96::472, icmp_seq=1 hlim=64 time=69.677 ms 16 bytes from fd31:4159:96::472, icmp_seq=2 hlim=64 time=65.679 ms 16 bytes from fd31:4159:96::472, icmp_seq=3 hlim=64 time=75.663 ms 16 bytes from fd31:4159:96::472, icmp_seq=4 hlim=64 time=66.416 ms 16 bytes from fd31:4159:96::472, icmp_seq=5 hlim=64 time=92.396 ms 16 bytes from fd31:4159:96::472, icmp_seq=6 hlim=64 time=103.968 ms 16 bytes from fd31:4159:96::472, icmp_seq=7 hlim=64 time=84.832 ms 16 bytes from fd31:4159:96::472, icmp_seq=8 hlim=64 time=71.411 ms 16 bytes from fd31:4159:96::472, icmp_seq=9 hlim=64 time=73.255 ms 16 bytes from fd31:4159:96::472, icmp_seq=10 hlim=64 time=50.535 ms 16 bytes from fd31:4159:96::472, icmp_seq=11 hlim=64 time=51.770 ms The rules in action look like this: --------- add 2000 fwd ::1,57 tcp from 2a00:b580:8000:12:deaf:beef:dead:beef to me6 443 in add 2100 fwd ::1,87 tcp from 2a00:b580:8000:12:deaf:beef:dead:beef to me6 80 in --------- The IPv6 address is our SNI proxy. We have special vhosts configured in the web servers listening on 57 and 87, respectively, with proxy protocol enabled for the connections via IPv4 and proxy, while 80 and 443 are regular vhosts listening on the public IPv6 address. Any ideas would be greatly appreciated. Thanks, Patrick -- punkt.de GmbH Patrick M. Hausen .infrastructure Sophienstr. 187 76185 Karlsruhe Tel. +49 721 9109500 https://infrastructure.punkt.de info@punkt.de AG Mannheim 108285 Geschäftsführer: Daniel Lienert, Fabian Stein