openvpn and system overhead

Jim Thompson jim at netgate.com
Wed Apr 17 18:09:55 UTC 2019



> On Apr 17, 2019, at 10:54 AM, Wojciech Puchar <wojtek at puchar.net> wrote:
> 
> 
> 
> On Wed, 17 Apr 2019, Miroslav Lachman wrote:
> 
>> Wojciech Puchar wrote on 2019/04/17 17:08:
>>> i'm running openvpn server on Xeon E5 2620 server.
>>> when receiving 100Mbit/s traffic over VPN it uses 20% of single core.
>>> At least 75% of it is system time.
>>> Seems like 500Mbit/s is a max for a single openvpn process.
>>> can anything be done about that to improve performance?
>> 
>> You can play with ciphers, AES-NI etc.
>> https://community.openvpn.net/openvpn/wiki/Gigabit_Networks_Linux
>> 
>> Miroslav Lachman
>> 
>> 
> again. it's system time mostly not user time.

Yup.  I’ve looked at this a bunch over the years for pfSense.

The tun/tap device can be viewed as a simple Point-to-Point IP or Ethernet device, which instead of receiving packets from a physical 
media, receives them from user space program and instead of sending packets via physical media sends them to the user space program. 

Let's say that you configured IP on the tap0, then whenever the kernel sends an IP packet to tap0, it is passed to the application (OpenVPN, for example). 
OpenVPN encrypts, authenticates, and occasionally compresses this packet, encapsulates it, and sends it to the other side over TCP or (preferably) UDP.

The application on the other side receives the packet, decompresses and decrypts the data received and writes the packet to its TAP device, the kernel on the other side handles the packet like it came from real physical device.

Each time you copy data from user to kernel or kernel to user space, you also incur a context switch with all the associated overheads.

Using a tun/tap device incurs an additional context switch in each direction, as you’re basically running the program to send data (say, ‘ping’ or ’ssh’), and another program is used to encrypt and encapsulate the packet before it leaves the machine.  The process is roughly the same on the other side.   So you get twice the copies, and twice the number of context switches.  Making things worse, the “IP stack” inside OpenVPN is single-threaded, and processes one packet at a time, so all the overheads accrue to each packet, rather than being amortized across several packets.

Net-net, openvpn won’t do close to 1Mpps.  There is a decent-enough write-up of recent actual benchmarking in a masters thesis that compares IPsec, OpenVPN and Wireguard, on linux here:

https://www.net.in.tum.de/fileadmin/bibtex/publications/theses/2018-pudelko-vpn-performance.pdf

Section 5.5 if you want to skip to the substance.  Basically, with *no* encryption overheads, OpenVPN still has a static overhead of around 8500 cycles/packet on the setup they used (Xeon E5-2620 v4), which seems quite similar to yours.  Given all this, they show that OpenVPN enters an overload condition at around 120Kpps.

There is some hope if you really have to have a lower-overhead OpenVPN.  An OpenVPN session has two channels, multiplexed on the same connection.  One is a control channel, the other is a data channel.  The control channel and associated configuration code in OpenVPN is … complex.  It has close to 10 trillion configuration options, and any re-write of this code would be a huge, huge undertaking.   Nearly unthinkable, really.   The data channel, otoh, is relatively straight-forward, especially if you don’t need all the crypto options provided, and, instead, limit yourself to, say AES-GCM or another AEAD (ChaCha20 / Poly1305) transform.  (Here, if your CPU has AES-NI or similar (e.g. ARMv8 has AES acceleration instructions) AES-GCM will always be faster.)

But, if you’re willing to limit yourself to one, or a few transforms, it theory, it’s possible to make a specialized tun / tap device such that the data channel is kept in-kernel, with encryption/decryption and encapsulation/decapsulation of data packets occurring in the kernel, but control packets passed up and down to/from the associated user space process.

A partial attempt of this idea (for linux) can be found here:  https://github.com/marywangran/OpenVPN-Linux-kernel  it looks abandoned, so maybe it didn’t pan out, or maybe the work just got asymptotic.

There is a bunch of work to get this right (keeping the openVPN user process happy, counters up to date, etc), but, at the end of the day, it’s all software.  Netflix got enough of OpenSSL's AES-GCM implementation into the kernel to run the transmit side.  They didn’t care about the receive side, and just let nginx deal with the relatively light rx flows in their deployment, but it does show that it’s possible with enough work.

Even with all that work, It will probably never be as fast as a decent IPsec implementation.

Jim




More information about the freebsd-hackers mailing list