svn commit: r277213 - in head: share/man/man9 sys/kern sys/ofed/include/linux sys/sys

Thu Jan 22 00:02:23 UTC 2015

> On Jan 21, 2015, at 4:38 PM, K. Macy <kmacy at freebsd.org> wrote:
> 
>> They originally found that things were spinning for way too long.
>> 
>> Hans found something similar and determined/concluded that the
>> migration code in callouts was racy-by-design and dramatically
>> simplified it and also put very hard constraints on what is a valid
>> situation to support migrating from one callwheel to another. Now we
>> have fallout which we can either address or back out until the callout
>> stuff is again reviewed/fixed.
>> 
>> I don't think it's as alchemic as is being promoted.
> 
> 
> Let's not get drawn into semantic debates. To me alchemy implies
> voodoo debugging, whereas this is, in part, disabling functionality
> that exposes races - which is more like disabling preemption or fine
> grained locking.
> 
> 
> Normally I would advocate backing this out on the basis of precedence.
> That is to say, back it out so that developers will get it right the
> first time. However, it appears that hps@ and some others don't
> understand what was done wrong. So I'm going to state some basic
> premises and then the implied basic guidelines required of a commit
> that were not met. If these guidelines are violated by a change in the
> future I will agitate for a rapid back out.
> 
> Premises:
> A) A performance regression is a bug every bit as much as a locking
> race. Stability OR performance (where we are talking about _current_
> performance) is a false dichotomy.
>   - This means that a change that fixes a bug by introducing a
> substantial performance regression does not constitute a fix.
> B) Existing behavior and performance characteristics should not be
> adversely changed unless there is widespread consensus.
>  - Sometimes it is necessary to "break" a KPI but that should not be
> discovered by svn or scanning back through the mailing lists.
> 
> Guidelines:
> A) Performance should not measurably regress.
> B) If a KPI changes behavior e.g. callout_reset_on being crippled, all
> clients need to be updated so that they maintain their current
> behavior by the committer changing the KPI. The client code
> maintainers aren't responsible for reacting to these types of changes.
> That is OK for marginal architectures like sun4v or pc98.
> C) If there is a non-zero possibility that these changes break the
> client code, committers who have touched that code in the last few
> years should be notified.
> 
> 
> 
> HPS: Your change failed to meet these guidelines. Some of us are upset
> because these guidelines are fairly fundamental for the on-going
> viability of FreeBSD. Due to linguistic / time zone / cultural
> differences these expectations have not been adequately communicated
> to you. You are not in the USB sandbox where others need for your
> support outweighs the inconvenience of random breakage.
> 
> It sounds like you are making progress towards updating the concerns
> that have been voiced. If kib's observations are in fact comprehensive
> then adding a callout_init_cpu function and updating all clients so
> that their callouts continue to be scheduled on a CPU other than the
> BSP will suffice and we can all move on.

Is there some reason that we can’t back things out, break things down into
smaller pieces and have everything pass through phabric with a wide
ranging review? Given the fundamental nature of these changes, they
really need better review and doing it after the fact seems to be to be
too risky. I’m not debating that this “fixes” some issues, but given the
performance regression, it sure seems like we may need a different
solution to be implemented and hashing that out in a branch might be
the best approach.

Warner