svn commit: r277213 - in head: share/man/man9 sys/kern sys/ofed/include/linux sys/sys

Wed Jan 21 00:50:00 UTC 2015

On 20 January 2015 at 14:30, Hans Petter Selasky <hps at selasky.org> wrote:

> Backing out my callout API patch means we will for sure re-introduce an
> unknown callout spinlock hang, as noted to me by several people. What do you
> think about that?
>
> Maybe "Jason Wolfe" CC'ed can add to 10-stable w/o my patches:
>
> int
> callout_reset_sbt_on(struct callout *c, sbintime_t sbt, sbintime_t
> precision,
>     void (*ftn)(void *), void *arg, int cpu, int flags)
> {
>         sbintime_t to_sbt, pr;
>         struct callout_cpu *cc;
>         int cancelled, direct;
>
> +       cpu = timeout_cpu;   /* XXX test code XXX */
>
>         cancelled = 0;
>
> And see if he observes a callout spinlock hang or not on his test setup. The
> patch above should force all callouts to the same thread basically. Then we
> could maybe see if single threading the callouts has anything to do with
> solving the spinlock hang.
>
> The "rewritten" callout API still has all the features and capabilities the
> old one had, when used as described in "man 9 callout".
>
> At the present moment I'm not technically convinced a backout is correct.
>
> Gleb: I think we would see far better results with high speed internet links
> using TCP if we could extend the LRO (large receive offload) code to
> accumulate more than 64KBytes worth of data per call to the TCP stack
> instead of complaining about some callouts ending up on the same thread!
> Actually I have a patch for that.

You should totally try say, 100,000 active TCP connections on a box.
See what happens to swi0 (clock).

TL;DR - the lock contention sucks and it takes a chunk of the core up.
The lock contention is highly not good.

That's why I'd like to see both the callout stuff in its
slightly-better-defined-and-sane state from you /and/ make it so TCP
can use it.

I'll have to double-check to see if the RSS stuff is all lined up
correctly so we can use it when we create the callouts (well, at inpcb
creation time, right), rather than when we first schedule them. Then
we can experiment with having the initial CPU be specified at callout
create time rather than expecting to be able to move it when we first
schedule it.

Or, hm, maybe have it so we don't have a CPU chosen until the first
time we schedule the timeout, and if it hasn't been scheduled before,
allow the CPU to be set? Because at that point we aren't migrating it
off f timeout_cpu - it's never been added to it in the first place.

-a