svn commit: r277213 - in head: share/man/man9 sys/kern sys/ofed/include/linux sys/sys

Tue Jan 20 23:39:51 UTC 2015

Sean,

Was this really "Reviewed by: sbruno@" or just "Tested by: sbruno@"?  I 
was very happy to see so many reviewers on the original commit but you 
seem to be the only one still left on the list.

Regards,
Navdeep

On 01/20/15 15:35, Sean Bruno wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> On 01/20/15 14:30, Hans Petter Selasky wrote:
>> On 01/20/15 22:11, Gleb Smirnoff wrote:
>>> On Tue, Jan 20, 2015 at 09:51:26AM +0200, Konstantin Belousov
>>> wrote: K> > Like stated in the manual page,
>>> callout_reset_curcpu/on() does not work K> > with MPSAFE callouts
>>> any more! K> I.e. you 'fixed' some undeterminate bugs in callout
>>> migration by not K> doing migration at all anymore. K> K> > K> >
>>> You need to use callout_init_{mtx,rm,rw} and remove the custom
>>> locking K> > inside the callback in the TCP stack to get it
>>> working like before! K> K> No, you need to do this, if you think
>>> that whole callout KPI must be K> rototiled.  It is up to the
>>> person who modifies the KPI, to ensure that K> existing code is
>>> not broken. K> K> As I understand, currently we are back to the
>>> one-cpu callouts. K> Do other people consider this situation
>>> acceptable ?
>>>
>>> I think this isn't acceptable. The commit to a complex subsystem
>>> lacked a review from persons involved in the system before. The
>>> commit to subsystem broke consumers of the subsystem and this was
>>> even done not accidentially, but due to Hans not caring about
>>> it.
>>>
>>> As for me this is enough to request a backout, and let the
>>> change back in only after proper review.
>>>
>>
>> Hi Gleb,
>>
>> Backing out my callout API patch means we will for sure
>> re-introduce an unknown callout spinlock hang, as noted to me by
>> several people. What do you think about that? dram Maybe "Jason
>> Wolfe" CC'ed can add to 10-stable w/o my patches:
>>
>
> Jason picked up this patch for work and it resolved our instability
> issues that had remained unsolved for quite some time as reported to
> freebsd-net:
>
> https://lists.freebsd.org/pipermail/freebsd-net/2015-January/040895.html
>
> This had gone undiagnosed for some time (even with the gracious help
> of jhb in offline emails, thanks btw!).
>
> There's some diagnostics in that email thread that may be of value to
> you folks for determination of the validity of changing the callout
> API or at least understanding why we were involved in diagnostics.
>
> While I'd sure love to tune performance, the fact that our machines
> were basically going out to lunch without these changes, probably
> means that others were seeing it and didn't know what else to do.  As
> much as I enjoy a good "break out the pitch forks and torches" email
> thread, this increased stability for us and is allowing us to upgrade
> from freebsd8 to freebsd10.  Bear this in mind when you throw your
> voice in favor of reverting.
>
>> int callout_reset_sbt_on(struct callout *c, sbintime_t sbt,
>> sbintime_t precision, void (*ftn)(void *), void *arg, int cpu, int
>> flags) { sbintime_t to_sbt, pr; struct callout_cpu *cc; int
>> cancelled, direct;
>>
>> +       cpu = timeout_cpu;   /* XXX test code XXX */
>>
>> cancelled = 0;
>>
>
> Jason or I would have to run this in production, which would be
> problematic I fear.  We never had a deterministic test case that would
> exhibit the reported failure.  We merely "tested in production" and
> saw that panics ceased.  We didn't note a dropoff in our traffic
> either, perhaps we are not as efficient as others in this corner case,
> but we were consistently seeing the spinlock hangs after a day or so
> of traffic.
>
>> And see if he observes a callout spinlock hang or not on his test
>> setup. The patch above should force all callouts to the same thread
>> basically. Then we could maybe see if single threading the callouts
>> has anything to do with solving the spinlock hang.
>>
>> The "rewritten" callout API still has all the features and
>> capabilities the old one had, when used as described in "man 9
>> callout".
>>
>> At the present moment I'm not technically convinced a backout is
>> correct.
>
> Neither am I, to be honest.  Just based on *results*.
>
>>
>> Gleb: I think we would see far better results with high speed
>> internet links using TCP if we could extend the LRO (large receive
>> offload) code to accumulate more than 64KBytes worth of data per
>> call to the TCP stack instead of complaining about some callouts
>> ending up on the same thread! Actually I have a patch for that.
>>
>> --HPS
>>
>>
>>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2
>
> iQF8BAEBCgBmBQJUvuYrXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
> ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRCQUFENDYzMkU3MTIxREU4RDIwOTk3REQx
> MjAxRUZDQTFFNzI3RTY0AAoJEBIB78oecn5kJTMIAMfh6ghV/AwQauY+a44q1hjJ
> WC7E3u69FK0opgSYg71kk6HckbyB+sTWND6HdXnpyrcMLXUt74zlB8c48wbUUV5+
> EwKNYzGNNnDNhoc0WtPMect8e9Y1kBRvSGfHBdVrqATXfPOyZEa+i4lQAXpiFKIt
> nndqVrAH7bJM6143YDpnIg7vaR+8IQnC2ztSP4ogJzh03DZ7zVsg4BsoCPg50eVZ
> kr46cXcE+SP/TsQBsVNVwRJD5NFie6QJdLoTnwkd0XfQGJMIWivNgUcE4tIxqPIf
> nGQ0xMJCotpNLuPtzYzCIurSaDHdHmL6bjkhjTtBmWsMNfdGH8TKoih79GDxkTg=
> =Y3rd
> -----END PGP SIGNATURE-----
>