Implementation of Sampling for BPF

Mon Jan 7 08:31:27 PST 2008

Good Afternoon,

> It's the question of doing things correctly(tm) so they are appropriate 
> for inclusion into the main src tree of the FreeBSD Project - this must 
> be universal enough to meet other people needs and to be supported. You 
> of course are free to do any patches at your locals site for your 
> individual needs - many people do that customization on their own.

Indeed, and the later part of your statement is what my primary goal is, however 
I'm unfamiliar with this part of the kernel and could do with a few pointers 
about what the correct way would be from a programmatic point of view.

> So what if a malicious packet will be skipped due sampling, packet which 
> is by other means undistinguishable from others before detailed analysis?

If this case happens it is unfortunate and it slips through the net, however 
malicious problems that I look for are more often flows rather then individual 
packets. We drop most protocols at the border that would give us an issue with 
one packet. There is a greater chance of managing to sample at least one packet 
of a malicious flow.

> Low in chain instead of high, you mean? That's of course no point to 
> sort out things in userland, but that's properties of given BPF program 
> to filter - how much the userland program wants to receive before 
> detailed analysis.

Please forgive my use of low and high, it seems to depend on which end of the 
stack you're looking from :). I meant as close to it coming into the kernel as 
possible, yes.

> Putting as many servers as needed does scale well if you need only 
> sampled data - just put an appropriate sampler/load balancer before 
> them. And using FreeBSD on that servers will be cheaper than commercial 
> hardware solution, too.

Again, no ability to buy a sampler/load balancer, nor any space/heat/power to 
run one in. My available equipment consists of two core networking devices, some 
fibre, two Intel gig optical cards and one powerful(ish) Dell server currently 
running FreeBSD 6.X, which needs bumping to 7.0 when it's released. The kit at 
the other end of these optical links is either busy or incapable of sampling.

> Why sample is enough to you? What exactly do you need? May be you'd 
> rather write some simpler expressions for in-kernel filtering instead of 
> heavy-weighted Snort?

I'm afraid I will not discuss our exact requirements in an open forum, this 
seems unwise from a security point of view.

I would be happy to implement this as a BPF filter, but I'm unaware of how 
sample in the filter language and count with variables, rather then look at 
fields in a packet.

More additional uses I could possibly foresee:
* NetFlow Generation - For which sampling is perfectly acceptable, although we 
currently do this in hardware.

* Statistics Generation - What are our users using our network for, etc. Now of 
course a lot of this data can be obtained from NetFlow (as we do at current) but 
there are aspects that can't, like average packet sizes per protocol, etc, 
things like that.

* Research - I'm regularly asked for sampled data from our network from 
researchers (which currently I turn down) but I'm assuming that they think 
sampled data is quite suitable.

I can understand your hesitation about including something like this in the 
project as a whole, but as I've said this is primarily for our purposes.

If others would find it useful that's great and I'll maintain a patch on a 
webserver, if the project as a whole would find it useful that's great too.

It would be nice at least from a academic point of view for FreeBSD to support 
other research too, for example the work being done to separate the congestion 
control to permit easier testing of different methods.

P.
-- 
Peter Wood <peter at alastria.net>