svn commit: r244112 - head/sys/kern

Sat Dec 15 12:35:27 UTC 2012

On 12/14/12 10:04 PM, Bruce Evans wrote:
> On Fri, 14 Dec 2012, Alfred Perlstein wrote:
>
>> On 12/14/12 4:12 PM, Robert Watson wrote:
>>> On Fri, 14 Dec 2012, John Baldwin wrote:
>>>
>>>> On Thursday, December 13, 2012 4:02:15 am Gleb Smirnoff wrote:
>>>>> On Wed, Dec 12, 2012 at 04:53:48PM -0800, Alfred Perlstein wrote: 
>>>>> A> The problem again is that not all the KASSERTS are inviolable, 
>>>>> if you A> want to do a project to split them, then please do, it 
>>>>> would really be A> helpful, as for now, they are a mis-mash of 
>>>>> death/warnings and there are A> at least three vendors who approve 
>>>>> of this as well as 3 long term A> committers that approved my 
>>>>> change (not including Adrian).
>>>>>
>>>>> Can you show examples of not inviolable KASSERTs?
>>>>
>>>> There are none.  They are all assertions for a reason. However, in my
>
> Not even one whose existence is a bug? :-)

This is my point exactly, and exactly what I've been experiencing.

People keep beating this drum "all invariants/panics are there for a 
reason", no, some happen to be bugs, and when I'm shipping code to a 
customer, I may need to skip one of these buggy assertions.

Some people have responded to this by saying "alfred show me the buggy 
assertions" to which my response is two fold "firstly, you're calling me 
a liar which I do not appreciate, second prove to me that all the 1000s 
of asserts have zero bugs and are not overly zealous about calling panic."
>>>> experience at several large consumers of FreeBSD, no one wants to 
>>>> run with INVARIANTS in production.  Not because we don't want 
>>>> panics (believe me, Yahoo! gets plenty of panics even with 
>>>> INVARIANTS disabled), but because the performance overhead, and 
>>>> redefining INVARIANTS into printf doesn't address that at all.
>
> I mostly don't use INVARIANTS, but once used Debugger() a lot. Now I
> only want to debug my own code and sometimes turn on INVARIANTS to try
> to do this, but it usually exposes more bugs not in my own code so I
> turn it off again :-).

*yup*

>
>>> In the past, FYI, the two major INVARIANTS hits were un-inlining of 
>>> locking, and UMA using additional global locking to perform 
>>> heavier-weight sanity checking.  For me, at least, the former wasn't 
>>> such a problem, but the latter was extremely measurable.  I have 
>>> occasionally wondered if we should have another option name for 
>>> heavier-duty sanity checking, as is the case with WITNESS, 
>>> SOCKBUF_DEBUG, etc, so that INVARIANTS overhead can be made 
>>> tolerable for more real-world installations.  As you observe, our 
>>> invariants tests are generally things where you catch critical 
>>> problems in much more debuggable states, rather than sources of 
>>> additional fragility, and it would be great if we could leave more 
>>> in so that bug reports were able to contain more information.
>
> I think that KASSERT() is used too much:
> - the heavyweight uses that you mention
> - many lightweight unconditional panics in old code were turned into
>   conditional panics using KASSERT().  So production code that turns off
>   INVARIANTS no longer gets the benefit of the sanity checking for these.
> - newer code tends to use KASSERT() even for things that could reasonably
>   be lightweight unconditional panics.  So production code that turns off
>   INVARIANTS never got the benefit of the sanity checking for these.
> - not many unconditional panics (with non-lightweight checking for them)
>   can't be left in production code, so there were a limited number of 
> them,
>   with a few pushed under DIAGNOSTIC.  In particular, ones for debugging
>   didn't grow unboundedly.  Now, KASSERT()s grow unboundedly and the
>   cost of enabling INVARIANTS grows unboundedly.
> - the source code becomes unreadable, with half of it tending to consist
>   ot misformatted KASSERT()s SHOUTING IN UPPER CASE.  The latter bug is
>   missing in assert(3).  assert(3) has a better designed API in other
>   respects too.  For example it supplies the bloat for __func__, __FILE__
>   and __LINE__ whether you want this or not.  This bloat was 
> intentionally
>   left out for KASSERT().  But many uses of KASSERT() supply this bloat
>   in the caller, where it also bloat the source code.

BUT I LIKE TO PANIC IN UPPER CASE BRUCE.
>
>> The KTR system offers an interesting reference for a model that 
>> allows us to make a compile time decision about which areas to trace.
>>
>> There used to be a DIAGNOSTIC option for the more heavy checks, this 
>> is either removed or just not used these days.  Again, using a mask 
>> like KTR might be a win if we bring back the equivalent of heavy 
>> weight DIAGNOSTIC option.
>
> DIGANOSTIC still exists.  It is rarely used, and tends to be only used 
> for
> heavyweight checks that are too buggy or too heavyweight or just too old
> to put in KASSERT().

Yes.
>
>> Often I've been guilty of putting KASSERT(ptr != NULL) checks into 
>> the code too.  Those are really just to make it less painful when 
>> hitting that bug,
>
> You mean to make it _more_ painful when hitting that bug.

ha. :)
>
>> so instead of having to do a lot of homework to figure out the fault 
>> address and line number, I can just get a pretty printed message 
>> under INVARIANTS.  Maybe those "pretty null checks" need to go under 
>> another option, perhaps DIAGNOSTIC, or another .. ??PRETTY_PANIC.
>
> Null pointer panics give restartable page faults, with the fault address
> printed by trap_fatal() and the page fault often easy to restart using a
> debugger.  trap_fatal() calls the debugger, so the page fault isn't
> completely transparent.  This corresponds to gdb in userland stopping
> on the signal for the page fault.  You might have to use "handle SIGFOO
> Stop/Print/Pass" to get back to the faulting instruction.  In the kernel,
> continuing from trap_fatal() gives this behaviour.  The page fault will
> repeat endlessly unless you fix it up.  You can try various fixes until
> you find one that doesn't fault, or let it fault a few times to look
> at the context in different ways.
>
> If panic() is used for null pointers, then you get an unrestartable
> panic(), with the relevant context several frames back where it is
> hard to see and much more difficult to restart from (practically
> impossible now that panic() stops CPUs, etc., though it used to be
> possible to unwind the frame in some cases).
>
> The null pointer check doesn't even detect the common case where the
> pointer is &p->bar where p is null.  Now the bad pointer is a small
> offset from a null pointer (normally at a small address).  It would
> take an earlier check of p to detect this.
>
>> Anyhow, I've always been a huge fan of FreeBSD due the additional 
>> debugging checks it offered.  I was the one that suggested 
>> splassert() back in the day when we would continually find issues 
>> with spl calls.  Additionally I found
>
> SPLASSERT() was almost never used.  The only uses that I can find of it
> are all in netipsec.
>
>> doing filesystem work a ton easier with DEBUG_VFS_LOCKS under FreeBSD 
>> than under Darwin which at the time did not have such a feature.
>
> This was more useful I think :-).  The main reason that SPLASSERT()
> was useless was because spl was a stable API that went away at about
> the same time that SPLASSERT() was added (SPLASSERT() was implemented
> about 6 months before SMPng killed spl).

Yes, splassert was a bit of too little too late.

-Alfred