svn commit: r303733 - head/contrib/libpcap

Fri Aug 5 04:27:35 UTC 2016

On Thu, 4 Aug 2016, Jung-uk Kim wrote:

> On 08/03/16 11:40 PM, Bruce Evans wrote:
>> On Wed, 3 Aug 2016, Jung-uk Kim wrote:
>>
>>> Log:
>>>  Support nanosecond time stamps for pcap_dispatch(3) and pcap_loop(3).
>>>
>>> Modified:
>>>  head/contrib/libpcap/pcap-bpf.c
>>>
>>> Modified: head/contrib/libpcap/pcap-bpf.c
>>> ==============================================================================
>>>
>>> --- head/contrib/libpcap/pcap-bpf.c    Wed Aug  3 19:23:22 2016
>>> (r303732)
>>> +++ head/contrib/libpcap/pcap-bpf.c    Wed Aug  3 20:08:39 2016
>>> (r303733)
>>> @@ -1008,7 +1028,25 @@ pcap_read_bpf(pcap_t *p, int cnt, pcap_h
>>>         if (pb->filtering_in_kernel ||
>>>             bpf_filter(p->fcode.bf_insns, datap, bhp->bh_datalen,
>>> caplen)) {
>>>             struct pcap_pkthdr pkthdr;
>>> +#ifdef BIOCSTSTAMP
>>> +            struct bintime bt;
>>> +
>>> +            bt.sec = bhp->bh_tstamp.bt_sec;
>>> +            bt.frac = bhp->bh_tstamp.bt_frac;
>>
>> The names are very confusing since bt_sec and bt_frac are only misnamed as
>> sec and frac in struct bintime.
>
> bt_* means BPF timestamp, not bintime.

Yes, it has even larger naming bugs than I noticed before.  It should
have been a union or something.

> ...
>> Old code is even more confusing, and at least partly wrong.
> ...
>> X sys/net/bpf.c:    struct timeval32 bh_tstamp;    /* time stamp */
>>
>> Banal comment.  The complexities are from what sort of timestamp this is.
>> It is obviously a timestamp.
>>
>> This bh_tstamp is in struct bpf_hdr32 for the !BURN_BRIDGES case.  There
>> is also struct timeval bh_timestamp in struct bpf_hdr.  This header is
>> bogusly marked Obsolete.
>
> It is superceded by struct bpf_xhdr although we kept it for backward
> compatibility.  New applications should avoid it.
>
>> X sys/net/bpf.c:                hdr32_old.bh_tstamp.tv_usec = ts.bt_frac;
>>
>> This is in the !BURN_BRIDGES && COMPAT_FREEBSD32 case.  Since struct
>> timeval32
>> always has a 32-bit tv_usec, this assignment discards the most significant
>> bits in bt_frac but keeps the noise.
>>
>> X sys/net/bpf.c:            hdr_old.bh_tstamp.tv_usec = ts.bt_frac;
>>
>> This is in the !BURN_BRIDGES && !COMPAT_FREEBSD32 case.  Since tv_sec in a
>> normal timetamp is bogusly long, this accidentally preserves all of the
>> bits
>> in bt_frac on 64-bit arches.  On 32-bit arches, it loses the signal as for
>> the COMPAT_FREEBSD32 case.
>
> Note struct bpf_ts is not struct bintime.

I missed that it is only used for the old standard timeval format here.

My main complaints about bintimes in bpf are actually:
- their existence
- broken conversion between monotonic times and real times.

There are not just bintimes, but 12 new time formats/resolutions/types/
qualities from supporting the closure of 3 formats/resolutions, 1 boolean
MONOTONIC flags and 1 boolean FAST flag.  One of these was the old
standard timeval (!MONOTONIC, !FAST) format.  Before this commit, none
of the other 11 was referenced anywhere in /usr/src except in the
implementation (net/bpf.[ch]) and the man page (bpf.4).  This commit
adds support for bintimes (!MONOTONIC, !FAST) in libpcap.  libpcap now
references a whole 1 of the BPF_T_ symbols (BPF_T_BINTIME) to implement
this.  The list of these symbols is quite readable in bpf.h, but the
paragraph in the man page is almost unreadable since horrible names
like BPF_T_MICROTIME_MONOTONIC_FAST are too hard to format -- they
tend to come out as 1 or 2 words per line.

Here is the broken conversion:

X static void
X bpf_bintime2ts(struct bintime *bt, struct bpf_ts *ts, int tstype)

It starts with style bugs similar to other bintime code.

X {
X 	struct bintime bt2, boottimebin;
X 	struct timeval tsm;
X 	struct timespec tsn;
X 
X 	if ((tstype & BPF_T_MONOTONIC) == 0) {
X 		bt2 = *bt;
X 		getboottimebin(&boottimebin);
X 		bintime_add(&bt2, &boottimebin);
X 		bt = &bt2;
X 	}

bpf now uses only monotonic bintimes as its primary format, and calls
this function to convert.  This function does the conversions with
less efficiency and larger races than timecounter code.  This addition
of boottimebin is still fundamentally broken.  Non-atomic accesses to
boottimebin were only fixed recently.  The above still has a race between
reading the monotonic bintime add converting it to a real time.  This
race is now fixed in timecounter code.  Races for leap seconds adjustments
are also fixed there but not here.  There are still large races for
setting boottimebin.

X 	switch (BPF_T_FORMAT(tstype)) {
X 	case BPF_T_MICROTIME:
X 		bintime2timeval(bt, &tsm);
X 		ts->bt_sec = tsm.tv_sec;
X 		ts->bt_frac = tsm.tv_usec;
X 		break;
X 	case BPF_T_NANOTIME:
X 		bintime2timespec(bt, &tsn);
X 		ts->bt_sec = tsn.tv_sec;
X 		ts->bt_frac = tsn.tv_nsec;
X 		break;
X 	case BPF_T_BINTIME:
X 		ts->bt_sec = bt->sec;
X 		ts->bt_frac = bt->frac;
X 		break;
X 	}
X }

The others are just less efficient than direct timecounter conversion.

The conversions are even impossible to do correctly in userland.  It
would be reasonable for userland to take only monotonic timestamps
and convert them to real times much later.  But the only way to do
this is to record the value of boottimebin at every reading of the
monotonic clock (without the races in the above), so it can be added
later.  It is just as non-easy to record the monotonic and real times
taken atomically on every reading.

I checked most uses of [get]boottimebin() and didn't find any correct
ones.  All uses are subject to races, and at best the addition breaks
your monotonic time to give a real time perfectly broken for POSIX
leap seconds, and if you want the latter why did you read the monotonic
time in the first place?  As in bpf, timecounter code would have given
a more perfectly broken real time if used directly.

Bruce