svn commit: r227778 - head/sys/net

Mon Nov 21 10:44:03 UTC 2011

On 11/21/11 20:52, Julien Ridoux wrote:
>
> On 21/11/2011, at 7:28 PM, Lawrence Stewart wrote:
>
>> On 11/21/11 18:28, Lawrence Stewart wrote:
>>> On 11/21/11 17:18, Julien Ridoux wrote:
>>>>
>>>> On 21/11/2011, at 4:39 PM, Lawrence Stewart wrote:
>>>>
>>>>> On 11/21/11 16:12, Ben Kaduk wrote:
>> [snip]
>>>>>> Is it really necessary to make the ABI dependent on a
>>>>>> kernel configuration option? This causes all sorts of
>>>>>> headaches if loadable modules ever want to use that ABI,
>>>>>> something that we just ran into with vm_page_t and friends
>>>>>> and had a long thread on -current about.
>>>>>
>>>>> Fair question. Julien, if pcap and other consumers will
>>>>> happily ignore the new ffcount_stamp member in the bpf
>>>>> header, is there any reason to conditionally add the
>>>>> ffcounter into the header struct?
>>>>
>>>> It is a valid question indeed. The feedback I have received so
>>>> far was to not have the feed-forward clock support be a default
>>>> kernel configuration option. What follows is based on this
>>>> assumption.
>>>>
>>>> The commit (r227747) introduces sysctl that are conditioned by
>>>> the same "FFCLOCK" kernel configuration option. If a loadable
>>>> module tests for the presence of this sysctl, it will know if
>>>> the ffcount_stamp member is available. Is it too much of a
>>>> hack?
>>>>
>>>> Alternatively, if the ffcounter is added to the bpf header
>>>> unconditionally, the ffcount_stamp member can be set to 0.
>>>> Loadable modules will then see a consistent ABI but will
>>>> retrieve a meaningless value.
>>>>
>>>> I am not sure which option makes more sense, any preference?
>>>
>>> If I understand the issues correctly, I think the appropriate
>>> path forward is to remove the conditional change to the bpf
>>> header and have ffcount_stamp become a permanent member of the
>>> struct. We'll just leave the member uninitialised in the !FFCLOCK
>>> case. This change will make the patch un-MFCable, but I think
>>> that's ok.
>>
>> I think I might be changing my mind about the benefits of having
>> both the timestamp and the ffcounter value in the header. Adding an
>> extra 8 bytes to the bpf header size does give us the ability to do
>> some neat things in a dual system clock world, but on further
>> reflection these things may not be worth the size increase cost for
>> the average FreeBSD user.
>>
>> On further reflection, I suspect the number of people likely to be
>> interested in having the flexibility of retaining both the
>> timestamp and ffcounter value would be less than those who don't
>> care. We could therefore expect that anyone wanting the flexibility
>> should patch their kernel and rebuild kernel/world in order to
>> handle the ABI breakage and put things in sync.
>>
>> Given that ffcounter is currently a 64bit value, we could add a
>> CTASSERT() safety belt somewhere to check that sizeof(ffcounter) ==
>> sizeof(struct timeval) and then retask the net.bpf.ffclock_tstamp
>> sysctl (or perhaps better rename it to something like
>> "net.bpf.timestamp_src") and have it take options like "fb_time",
>> "ff_time" or "ff_counter" for feedback clock based timestamp,
>> feed-forward based timestamp, or feed-forward counter based
>> timestamp.
>>
>> We would then have the bpf code shoehorn either a regular timeval
>> timestamp or the ffcounter value into bh_tstamp header field
>> depending on whether "fb_time"/"ff_time" or "ff_counter" is
>> selected. Anything consuming the BPF header would then need to know
>> how to interpret the bh_tstamp field. Perhaps we could add a
>> general purpose flags field to the BPF header and define a flag
>> which indicates if the bh_tstamp field is a timestamp or counter
>> value.
>>
>> Thoughts?
>
> In a typical scenario with a feed-forward clock, the synchronisation
> daemon needs the ffcounter timestamps. Tcpdump (or any libpcap based
> processed) would expects a timetamp in seconds formatted as a
> timeval.
>
> If the ffcounter value is stored as an extra value in the bpf header,
> both processes are happy.

Sure. I guess my thinking was that the minute someone throws the switch, 
they would need to know what they're getting themselves into. We 
consistently expect that from our users so no reason not to here ;)

> If the current timestamp field is used, then the system wide sysctl
> will not suffice. This will require to modify/add bpf ioctl to be
> able to configure each open bpf independently. Also, selecting a

I like the sound of this - I think it would be the way to go for 
userspace processes. You'd still need some mechanism for in-kernel 
consumers to know how to interpret the timestamp field. I just noticed 
that there is a comment about implicit padding at the end of struct 
bpf_hdr, so adding a flags field to fill that padding would be possible 
without changing the ABI - I think we could safely shoehorn a uint16_t 
flags field in there?

> timestamp in seconds would prevent packet traces from being replayed
> and have their timestamps corrected in post-processing. However, I
> agree that this feature and the capacity to compare 2 different
> synchronisation algorithms running in parallel is attractive to a
> small proportion of FreeBSD users.

Perhaps we could introduce a new option e.g. "BPF_FFCOUNTER" which is 
independent of FFCLOCK. Then, if BPF_FFCOUNTER is specified in the 
kernel conf, the ffcount_stamp member gets conditionally embedded into 
the BPF header to enable these more advanced use cases. Yes it would 
change the ABI and yes it couldn't be relied upon by modules compiled 
against a kernel without BPF_FFCOUNTER, but I'm not seeing a problem 
with that - we would only expect the option to be used in more 
specialised circumstances and we would be providing a standard means to 
get ffcounter values in the timestamp field instead of a regular 
timestamp if an ioctl is set on the bpf device.

In this revised world order, the average user would run with FFCLOCK but 
without BPF_FFCOUNTER, and could use the new BPF ioctl to select 
feedback clock timestamp, feed-forward clock timestamp or feed-forward 
counter timestamp which would determine how the bpf header's bh_tstamp 
field is to be interpreted.

In the less common case where someone wants to have both the timestamp 
and ffcounter value in BPF headers, they stick BPF_FFCOUNTER in the 
kernel config, recompile world/kernel+modules and off they go.

It sounds like a reasonable middle ground to me, avoiding the 
8byte-per-packet overhead in the default case, but allowing advanced use 
cases with a single kernel option and recompilation of world/kernel+modules.

Cheers,
Lawrence