System clock APIs and feed-forward clock integration with BPF

Sun Dec 18 14:42:09 UTC 2011

> On 12/15/11 01:30, Julien Ridoux wrote:
>>
>> On 04/12/2011, at 11:31 PM, Lawrence Stewart wrote:
>>
>>> On 12/03/11 12:02, Jung-uk Kim wrote:
>>>> On Friday 02 December 2011 07:27 pm, Jung-uk Kim wrote:
>>>>> On Thursday 01 December 2011 11:43 pm, Lawrence Stewart wrote:
>>>>>> On 12/02/11 03:43, Jung-uk Kim wrote:
>>>>>>> On Thursday 01 December 2011 10:11 am, Lawrence Stewart wrote:
>>>>>>>> On 11/30/11 05:09, Jung-uk Kim wrote:
>>>>>>>>> On Tuesday 29 November 2011 11:13 am, Lawrence Stewart wrote:
>>> [snip]
>>>>>>>>>> Here's the first of the patches:
>>>>>>>>>>
>>>>>>>>>> http://people.freebsd.org/~lstewart/patches/misc/ffclock_bpf
>>>>>>>>>> _i nt act abi_10.x.r228130.patch
>>>>>>>>>
>>>>>>>>> I only glanced at it but it looks very close to what I wanted
>>>>>>>>> to suggest.
>>>>>>>>
>>>>>>>> Final candidate patch is at:
>>>>>>>>
>>>>>>>> http://people.freebsd.org/~lstewart/patches/misc/ffclock_bpf_i
>>>>>>>> nt act abi_10.x.r228180.patch
>>>>>>>>
>>>>>>>> Assuming it passes the "make tinderbox" build I'm currently
>>>>>>>> running and no further input is received from interested
>>>>>>>> parties, I plan to commit it in ~10 hours.
>>>>>>>>
>>>>>>>> Changes since the r228130 patch I sent previously:
>>>>>>>>
>>>>>>>> - The new flags in bpf.h are added unconditionally so that
>>>>>>>> they can always be referenced at compile time and a decision
>>>>>>>> made at runtime as to whether a flag will be set or not. Using
>>>>>>>> one of the new flags when the kernel doesn't have FFCLOCK
>>>>>>>> compiled in results in the flag being ignored. An app should
>>>>>>>> check for the existence of the "ffclock" kernel feature or the
>>>>>>>> "kern.sysclock" sysctl tree before attempting to use the
>>>>>>>> flags.
>>>>>>>>
>>>>>>>> - This patch will hopefully be MFCed at some point, so I added
>>>>>>>> a CTASSERT to bpf.c to ensure that the ABI of structs
>>>>>>>> bpf_hdr32, bpf_hdr and bpf_xhdr remains intact when FFCLOCK is
>>>>>>>> enabled and the union of a ffcounter and struct
>>>>>>>> timeval32/timeval/bpf_ts is switched in.
>>>>>>>>
>>>>>>>> - bpf_gettime() more comprehensively covers all the possible
>>>>>>>> cases of flag combination and does sensible things for each
>>>>>>>> case (even though some cases are rather silly).
>>>>>>>>
>>>>>>>> - The snippet of code at the beginning of catchpacket() that
>>>>>>>> was manipulating the struct bintime derived from bpf_gettime()
>>>>>>>> was gross and has been removed in favour of selecting the
>>>>>>>> right {get}bin{up}time() function call in bpf_gettime().
>>>>>>>
>>>>>>> I did that to reduce branching. Since you are introducing more
>>>>>>> branches, it warrants a function pointer now.
>>>>>
>>>>> There's another reason, BTW. If mbufs are tagged with timestamps
>>>>> (where only monotonic timestamps are allowed), they must be
>>>>> converted within the bpf.c. I forgot all the details. :-(
>>>
>>> We should document this knowledge in some code comments.
>>>
>>>>>> I see, thanks for the explanation. Could you elaborate a bit more
>>>>>> about how you would implement the function pointer idea? I'm also
>>>>>> curious in the !FFCLOCK case just how much overhead having the
>>>>>> 2-layer nested if/else adds. I'm not an very optimisation savvy
>>>>>> person, but I'm wondering if it's actually worth micro-optimising
>>>>>> this code.
>>>>>>
>>>>>> My initial thoughts about your function pointer idea lead to
>>>>>> adding a function pointer in the bpf_d and setting it to the
>>>>>> appropriate function to get the timestamp from at bpf_d creation
>>>>>> or ioctl time. Whilst I like this idea, I can't see how it would
>>>>>> work given that the various functions involved in time/ffcounter
>>>>>> stamp generation all have different signatures.
>>>>>>
>>>>>> We could have multiple variants of bpf_gettime() which each call
>>>>>> the appropriate underlying functions to generate the appropriate
>>>>>> stamp. Would add quite a lot of code but would reduce the
>>>>>> overhead of calling bpf_gettime() to an indirect function call +
>>>>>> the underlying stamp generation function call. This also solves
>>>>>> the problem of multiple function signatures.
>>>>>
>>>>> Please see my patch:
>>>>>
>>>>> http://people.freebsd.org/~jkim/bpf_ffclock.diff
>>>>
>>>> I booted up the kernel and found it just crashes because of stupid
>>>> typos. Now a new patch is uploaded in place. Sorry.
>>>>
>>>> Anyway, I see no regression nor ABI breakage. :-)
>>>
>>> struct bpf_d being part of the ABI was the main thing I was concerned
>>> about, so given that it isn't I like your approach a lot.
>>>
>>> As noted by Julien, this approach does introduce problems with
>>> respect to the follow up patch that adds a permanent bh_ffcounter
>>> member to the bpf header. I thought the fromclock() API would
>>> sufficiently meet the needs of consumers like BPF, but if we were to
>>> proceed with something like Jung-uk's proposed patch I don't think
>>> that's true anymore.
>>>
>>> We decided to bite the bullet and devise an API that is more compact
>>> and can return all appropriate time information from all underlying
>>> sysclocks in an efficient manner - something like a more generic
>>> version of ffclock_abstime(). Julien and I spent quite some time
>>> today nutting out details, and Julien has done a proof of concept
>>> patch against my proposed BPF patch which looks good as a starting
>>> point for discussion:
>>>
>>> http://www.cubinlab.ee.unimelb.edu.au/~jrid/patches/r228254/sysclock_snap.patch
>>>
>>>
>>> It needs a bit more work and should be split into two patches (one
>>> introducing the API and the other being the BPF changes).
>>>
>>> With something like this though, the BPF patch becomes radically
>>> simplified in the FFCLOCK case. We don't even need a function pointer
>>> cached in the bpf_d anymore, but could cache a struct sysclock_snap
>>> there instead if we really wanted to minimise overhead in bpf_gettime().
>>>
>>> We'll have a go at refining the patch tomorrow hopefully, but wanted
>>> to put this out there for early consideration.
>>
>>
>> Hi Jung-uk (and all),
>>
>> It took us a bit of time but we have a new patch. There are a lot of
>> changes in the patch, and I will start by summarising our motivation
>> for the changes. Hopefully I won't forget anything, but if I leave
>> something unclear, don't hesitate to ask Lawrence or me.
>>
>> Starting with the easy bits, we drop the union in the BPF header and
>> followed something similar to what Ben and you proposed.
>>
>> Next, the core motivation was to ensure that the timestamping function
>> be called once only per packet captured, while keeping the ability to
>> read both clocks and return the time from one of the two to the user.
>> This forced us to redesign the timestamping function and one thing
>> leading to another a fair bit of the BPF code.
>>
>> There 3 orthogonal parameters to the timestamping function in the BPF
>> context:
>> - which clock is used to return the time
>> - what is the format of the value returned
>> - and a performance issue (nothing to do with timestamping or
>> timekeeping): reading the hardware counter or not
>>
>> We handle the first one with a new sysclock_getsnapshot() which saves
>> both clock parameters when timestamping and potentially reads the
>> hardware counter.
>> The second one has been covered in previous discussion with the
>> addition of BPF_T_FFCOUNTER.
>>
>> The performance issue created problems. In the current code, the order
>> with which the BPF devices are open on one interface impacts the
>> timestamping function. Let's consider two BPF devices respectively
>> open with BPF_TSTAMP_NORMAL and BPF_TSTAMP_FAST quality. If NORMAL is
>> open after FAST, it is at the head of the list, the hardware counter
>> is read, and FAST benefits from higher quality timestamps. If FAST is
>> open after NORMAL, the timestamping function is called once and return
>> last kernel tick time, then it is called a second time for NORMAL,
>> this time accessing the hardware counter.
>>
>> None of these cases are satisfactory. This lead to inconsistent
>> behaviour based on the order the BPF are open. An application can
>> impact the settings of other BPF devices and other applications /
>> kernel consumers. The timestamping suffers from higher latency and
>> worse, higher variability of the latency, which is bad.
>>
>> We then took a different approach, which is inspired by your work on
>> BPF. We believe the FAST/NORMAL settings should have 2 level of
>> granularity: system wide and per interface. The system wide is the
>> default behaviour (for example all BPF devices are NORMAL) and the
>> interface one super-seeds the system one. This is implemented by the
>> use of new sysctl living in the net.bpf tree.
>>
>> This enables many scenarios such as:
>> - system default is FAST to improve performance because the system
>> uses a slow timecounter (eg HPET or ACPI).
>> - the system has 3 interfaces. Interface 2 is put in NORMAL mode (need
>> for accuracy but not much traffic), and the last can do hardware
>> timestamping and is put in BPF_TSTAMP_EXTERNAL mode.
>>
>> This then naturally led to store the performance setting
>> (none/fast/normal/external) for each interface instead of each BPF
>> descriptor.
>>
>> Each BPF descriptor still maintains the other 2 settings: the choice
>> of clock and the format of the timestamp independently from each other.
>>
>> We believe this new patch keeps the spirit of the changes you proposed
>> and improves it while accomodating the changes required by the
>> feed-forward clock and future evolutions with things like IEEE 1588
>> compatible NICs.
>>
>> We plan to push this patch as soon as possible. We would really
>> appreciate if you could comment on this long email and pach very soon.
>>
>> The patch can be found here:
>> http://www.cubinlab.ee.unimelb.edu.au/~jrid/patches/r228429-v4/ffclock_bpf_intactabi.patch

Candidate commit patches can be found at:

http://people.freebsd.org/~lstewart/patches/misc/sysclockgetsnap_10.x.r228435.patch

http://people.freebsd.org/~lstewart/patches/misc/ffclock_bpf_intactabi_10.x.r228435.patch

Relative to Julien's patch, I've split his patch into two, updated bpf.4 
documentation, added a net.bpf.tscfg.default sysctl to set the default 
time stamp for all BPF attached interfaces, and added some cleanup and 
bug fixes.

If I don't hear any objections or receive any requests to hold off 
committing to allow more time for review, I plan to commit these patches 
in the next few days.

Cheers,
Lawrence