MPD LAC Scaling

Sami Halabi sodynet1 at gmail.com
Mon Nov 14 17:16:59 UTC 2011


Hi,
i wonder why not putting all your suggestions on MPD instead of making
hacks, as I see it its necessary for performance,
after all MPD is aimed to FBSD only, so its elementary to use ke

Sami

2011/11/12 Alexander Motin <mav at freebsd.org>

> Hi.
>
> > I'm currently evaluating MPD as a potential LAC solution for a
> > project I'm working on.  I'm looking to try and handle at least 4Gbit
> > and 20,000 sessions worth of PPPoE -> L2TP LAC traffic per server.
> > The reading I've done from the archives so far seems to indicate that
> > this has not yet been done.
>
> I also haven't heard about so big cases, but also can't say it is
> theoretically impossible after some tuning and development work. At this
> point I have neither production/test environment nor much time to
> actively work on it, but I want to express some experience and ideas in
> case somebody wants to take that.
>
> First, as Julian said, it is not necessary should be one server to
> handle all load. Cluster of smaller machines should be preferable from
> many points. PPPoE allows you to have several servers and load-balance
> them. At this moment MPD can't balance load dynamically, but you can do
> it manually, limiting number of sessions per one server.
>
> As some example point of hardware from personal experience I can say
> that three years ago mpd5 on 1U servers with single Core2Duo CPUs, 1GB
> of RAM and two 1Gb NICs (less then $1K that time) handled in production
> about 2000 PPPoE sessions and 600Mbps of traffic per server, including
> Netflow generation, per-customer typed traffic shaping and accounting.
> Modern and more powerful hardware is able to do more.
>
> Getting higher numbers there can mostly be split in two questions:
> getting more traffic and getting more sessions, as limitations are
> different.
>  - Getting more traffic mostly means scaling kernel Netgraph and
> networking code to more CPU cores. As soon as Netgraph uses direct
> function calls when possible, it depends on number of network interrupt
> threads in system. Three years ago there was only one net SWI thread and
> setting net.isr.direct=1 while having several NICs in system allowed to
> distribute load between CPUs. Modern high-level NICs with several MSI-X
> interrupts should give the same effect. Now it is also possible to have
> several net SWI threads, but I haven't tested it.
>  - Getting more sessions also means tuning and optimizing user-level mpd
> daemon. Three years ago on Pentium4-level test machine I've reached
> about 5K PPPoE sessions with RADIUS auth/acct. Main limiting factor was
> user-level daemon performance. The more sessions connected, the more
> overhead daemon had in face of LCP echo requests and event timeouts to
> handle, number of netgraph kernel sockets to listen, etc. At some point
> daemon is just unable to handle all new incoming events in time and
> resending requests by clients causes cumulative effect. So the main
> limiting factor is not just number of users, but also number of events.
> If users connect one by one, number of sessions can be quite high. But
> if due to some accident you have all users dropped and reconnecting,
> that may cause overload sooner. In that case it is important even what
> LCP echo timeout set on the server and clients, or how many logs are
> enabled. My best tuning result that time on Pentium4-level machine was
> about 100 connections per second. It allowed to setup 5000 simultaneous
> sessions within 50 seconds. Higher numbers were problematic. At this
> moment user-level MPD's main state machine is single-threaded, except
> authorization and accounting (like RADIUS), that are done in separate
> threads, but require synchronized completion to return the data.
> Splitting main FPM on several threads is difficult, because it would
> require to somehow to group links and bundles within different threads
> with different locks, that is difficult, because of multilink support
> and because until user is authorized, it is impossible to say which
> bundle it should join. If there is need to handle several PPPoE services
> with different names or several LAN segments, it theoretically may be
> effective to have several MPD daemon instances running, one for each
> service/segment. Generally I've spent less time on profiling and
> optimizing MPD daemon itself then kernel code, so there still should be
> a lot of space for improvement. Some possible optimization points I
> still remember are:
>  - rework pevent() engine used by MPD state machine to use kqueue()
> instead of poll() to reduce event overhead overhead;
>  - optimize locking of paction() functions used for thread creation and
> completion for MPD-specific case; Idea was that by the cost of
> functionality it could be simplified to reduce number of context switches;
>  - rewrite RADIUS auth/acct support to run within main mpd thread or
> fixed number of external threads; since existing threaded approach was
> implemented, libradius got support for asynchronous operation; that
> should reduce overhead for thread creation/destruction;
>  - optimize ng_ksocket node when work with large number of hooks, using
> some optimized search, and/or make MPD to create another sockets for
> each next number of links to balance kernel and user-level search
> overheads; initially MPD created separate set of sockets for every link,
> but it was found too expensive for user-level FSM and was rewritten into
> present state with almost minimal number of sockets and most
> multiplexing tone in kernel.
>
> I have no personal production experience with PPPoE-L2TP LAC case. It is
> used much less often and I had only several reports from people actively
> using it and no much numbers. I think LAC case should have smaller
> overhead and CPU load and so better scalability then usual traffic
> termination: there is no IPCP layer in PPP to negotiate, there is no
> interfaces to create and configure, no Netflow, no shapes, no periodic
> accounting, etc. If you don't need to authenticate users, but only to
> forward connections, and so server doesn't need to handle LCP protocol,
> task simplifies even much more.
>
> If you can setup test environment to stress-test the LAC stuffs, it
> would be interesting to see the numbers. On my test lab I used several
> machines with mpd configured for thousands of PPPoE client sessions each
> to generate simultaneous connections. For testing LAC you also should
> have some fast enough L2TP terminator. If you have no such hardware for
> test, you may try use several systems with mpd L2TP servers spreading
> load between them in one of ways to avoid bottleneck there, while system
> load in such case may potentially slightly differ.
>
> --
> Alexander Motin
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>



-- 
Sami Halabi
Information Systems Engineer
NMS Projects Expert


More information about the freebsd-net mailing list