Netgraph performance question

Fri Feb 4 16:42:46 PST 2005

Guy Helmer wrote:

>  A while back, Maxim Konovalov made a commit to usr.sbin/ngctl/main.c
>  to increase its socket receive buffer size to help 'ngctl list' deal
>  with a big number of nodes, and Ruslan Ermilov responded that setting
>  sysctls net.graph.recvspace=200000 and net.graph.maxdgram=200000 was
>  a good idea on a system with a large number of nodes.
>
>  I'm getting what I consider to be sub-par performance under FreeBSD
>  5.3 from a userland program using ngsockets connected into ng_tee to
>  play with packets that are traversing a ng_bridge, and I finally have
>  an opportunity to look into this. I say "sub-par" because when we've
>  tested this configuration using three 2.8GHz Xeon machines with
>  Gigabit Ethernet interfaces at 1000Mbps full-duplex, we obtained peak
>  performance of a single TCP stream of about 12MB/sec through the
>  bridging machine as measured by NetPIPE and netperf.

that's not bad if you are pushing everything through userland.. 
That's quite expensive, and the scheduling overheads need to be taken
into account too.

>
>
>  I'm wondering if bumping the recvspace should help, if changing the
>  ngsocket hook to queue incoming data should help, if it would be best
>  to replace ngsocket with a memory-mapped interface, or if anyone has
>  any other ideas that would help performance.

Netgraph was designed to be a "lego for link layer stuff" where link 
laer stuff
was considered to be WAN protocols etc.

In particualr the userland interface was written with an eye to 
prototyping and debugging
and doesn't take any special care to be fast. (though I don;t know how 
you could be
faster going to userland).

Since then people have broadenned
its use considerably, and questionns of its performance have become 
quite regular.

It wasn't designed to be super fast, though it is not bad considerring 
what it does.
There is however a push to look at performance so it would eb 
interresting to see
in more detail what you are doing.
in particular, what are you doing in userland?
might it make sense to make your own custom netgraph node that does exaclty
what you want in the kernel?

>
>  Thanks in advance for any advice, Guy Helmer
>

I have considderred a memory mapper interface that would bold onto ng_dev.

I have done an almost identical interface once before (1986->1992)

There would have to be several commands supported.

define bufferspace size (ioctl/message)
mmap buffer space  (mmap)
allocate bufferspace to user (size) (returns buffer ID)
free bufferspace (ID)
getoffset (ID)  (returns offset in bufferspace)
writebuffer(ID, hook, maxmbufsize)  pick up the buffer, put it into 
mbufs (maybe as external pointers)
  and send out hook in question.

Incoming data would be written into buffers (a cpu copy would be needed) 
and the ID added to a list of
arrived IDs.
In addition you need a way to notify a listenning thread/process of 
arrived IDs.

In my original system the listenning process had a socket open with a 
particular protocol family
and waited for N bytes. when the data arrived, the socket returned the 
buffer ID, followed
by N-sizeof(ID) bytes from th header of the packet so that the app could 
check a header and see if it was interrested.

In later version s it used a recvmesg() call and the metadata was in the 
form of a protocol specific structure received in parallel
to the actual data  copied.

Arrived IDs/buffers were 'owned' by N owners where N was the number of 
open listenner sockets.
each listenner had to respond to the message by 'freeing' the ID if it 
wan't interrested.. closing the socket freed all
IDs still owned by it. closing the file did the same...

I forget some of the details.

I guess in this version, instead of sockets we could use hooks on the 
mmap node and we could use ng sockets
to connect to them..

The external data 'free' method in th embuf could decrement teh ID 
reference count and actually free it if
it reached 0 (when all parts ahd been transmitted?)  The userland 
process woudl free it immediatly after doing the
'send this' command. the reference counts owned by the mbuffs would stop 
it from being freed until the packets were sent.

In our previous version, we ahd a disk/vfs interface too and there was a 
"write this to filedescriptor N"
and "write this to raw disk X at offset Y" command too..  the disk would 
own a reference until the
data was written of course..  There was also a "read from raw disk X at 
offsett Y into buffer ID" command.
you had to own the buffer already for it to work..

in 1987 we were saturating several ethernets off disk with this with 5% 
cpu load :-)

disk->[dma]->mem->[dma]->ethernet

Since machines are now hundreds of  times faster (30MHz 68010 with 32 
bit mem
bus  vs 3GHz 64bit bus  machine) some of this doesn't make sense any 
more, but
it was an achievement at the time.

just an idea.