bpf/pcap are weird

Wed Nov 5 23:58:19 PST 2003

> Okay, this is goofy stuff and breaks a lot of code that otherwise makes 
> certain assumptions about pcap/bpf that don't work on FreeBSD.  Our
> bpf(4) doesn't actually care about the non-blocking fd flag, and our pcap(3) 
> doesn't care at all about BIOCIMMEDIATE.

This is a libpcap deficiency that I will probably fix at some point, as

	1) some libpcap applications might want that mode

and

	2) the way you get that mode differs on different platforms
	   (some platforms always implement it, e.g. Linux; other
	   platforms have different ways of requesting it).

It's in my queue along with a number of other libpcap deficiencies.

> Why do we have BIOCIMMEDIATE? 
> It seems like it's what SHOULD be implemented with the non-blocking I/O
> flag

No.  BIOCIMMEDIATE and non-blocking mode are different.

BIOCIMMEDIATE mode means "make incoming packets readable immediately;
don't buffer them up until either the store buffer is full or the
timeout expires".  This is for use in, for example, applications that
are using BPF to implement network protocols, and want to be able to
respond immediately to incoming packets, as opposed to, for example,
packet capture applications (tcpdump, Ethereal, etc.) which don't
necessarily need to immediately show or save incoming packets and which
might want to try to get as many packets as possible per read on the BPF
device.

It does *NOT* mean "an attempt to read on this device won't block even
if *no* packets are available", nor should it - applications running in
BIOCIMMEDIATE mode would probably still want to block, rather than spin,
if no packets are available.

Non-blocking mode should mean "an attempt to read on this device won't
block, even if there are no packets remaining", so it's not identical to
BIOCIMMEDIATE mode.

If used in conjunction with a properly-working "select()" or "poll()" -
i.e., one that causes the timeout timer to start when the "select()" or
"poll()" is done, so that the "select()" or "poll()" will wake up if the
store buffer fills *OR* the timeout expires - then it does need to be
the case that, if the "select()" or "poll()" says a read on the BPF
device will succeed, it will, in fact, succeed.  This could be
implemented by having reads in non-blocking mode always do a buffer
rotation if there are packets in the store buffer but not the hold
buffer, just as is the case in BIOCIMMEDIATE mode.

That's currently done in "bpf_read()" - note the "|| timed_out" in the
"if" inside the "while (d->bd_hbuf == 0)" loop.  That appears to have
been introduced in 4.5, in revision 1.59.2.8, which was an MFC of
revision 1.86:

	Make bpf's read timeout feature work more correctly with
	select/poll, and therefore with pthreads.  I doubt there is any way
	to make this 100% semantically identical to the way it behaves in
	unthreaded programs with blocking reads, but the solution here
	should do the right thing for all reasonable usage patterns.

	The basic idea is to schedule a callout for the read timeout when a
	select/poll is done.  When the callout fires, it ends the select if
	it is still in progress, or marks the state as "timed out" if the
	select has already ended for some other reason.  Additional logic in
	bpfread then does the right thing in the case where the timeout has
	fired.

	Note, I co-opted the bd_state member of the bpf_d structure.  It has
	been present in the structure since the initial import of 4.4-lite,
	but as far as I can tell it has never been used.

	PR:             kern/22063 and bin/31649

PR 22063 is "bpf when used with the select system call with timeout
doesn't forward packets on timeout":

	When bpf is accessed via libpcap with the select system call
	with a timeout set if a less than full buffer of packets
	received on the interface (and passed to bpf.c) they will never
	be returned to libpcap even on a timeout.  OpenBSD has a partial
	fix for this (it gets the first packet of 9 up and leaves the
	other 8) which I have corrected, reported to OpenBSD and ported
	to FreeBSD.

	As a side note one of the OpenBSD people is working on a better
	bpf implementation and would be interested in help by someone
	knowledgable in the FreeBSD VM system to assist porting his code
	when finished to FreeBSD.

(I think the "better bpf implementation" might be Michael Stolarchuk's
memory-mapped BPF, but I don't know whether it ever saw the light of
day.)

PR 31649 is "libpcap doesn't work with -pthread"; the problem is that
the userland pthreads library requires that "select()"/"poll()" and
non-blocking reads work on anything from which you're trying to read if
you can get long-term waits on it - and that wasn't the case for BPF
devices.

The question then is whether if *not* used with "select()" or "poll()"
reads should return whatever packets are there, even if the timer hasn't
expired.  One could argue that it should, in which case the "if" in
question should also check for "ioflag & IO_NDELAY".  I don't know
whether that would cause problems for any applications, though.