Using Instruction Pointer address in debug interfaces [Was: Re: vm_page_t related KBI [Was: Re: panic at vm_page_wire with FreeBSD 9.0 Beta 3]]

Oliver Pinter oliver.pntr at
Wed Nov 9 14:14:05 UTC 2011

On 11/9/11, Arnaud Lacombe <lacombar at> wrote:
> Hi,
> On Tue, Nov 8, 2011 at 9:35 PM, Julian Elischer <julian at> wrote:
>> On 11/8/11 5:52 PM, Arnaud Lacombe wrote:
>>> Hi,
>>> On Tue, Nov 8, 2011 at 7:08 PM, Julian Elischer<julian at>
>>>  wrote:
>>>> On 11/8/11 10:49 AM, Arnaud Lacombe wrote:
>>>>> Hi,
>>>>> To avoid future complaints about the fact that I would be only "talk"
>>>>> without "action", I did implement what I suggested above. As it is
>>>>> quite a large patch-set, I will not post it directly here, however, it
>>>>> is available on github:
>>>>> It convert a bunch of debug interface to use the caller instruction
>>>>> pointer, as well as a proof-of-concept teaching printf(9) to convert
>>>>> IP to symbol_name+offset.
>>>>> It translates in a direct saving of about +250kB on i386's GENERIC,
>>>>> just in kernel text size. Even the worst case, ie LOCK_DEBUG == 0,
>>>>> translates to a save of +80kB.
>>>>> Please note that this is still WIP code.
>>>> A couple of comments.
>>>> Firstly, the idea of a printf method to print the IP as symbol+offset is
>>>> an
>>>> interesting idea
>>>> that should be followed up in its own right.
>>> FWIW, I have no credit in this idea. It has been in Linux for ages and
>>> ages.
>> yeah as I said  at work I use linux and BSD...
>> the linux stuff that just prints out IP really annoys me.
>> the list stuff and netgraph debug (which should be off in any production
>> system)
> this is, I guess, where we do not agree. You find it acceptable to run
> totally different code in production and during debug. I do not. This
> is completely insane, even more nowadays where heavy parallelism
> increases the likelihood of races, and subtle change in the code, even
> optimization, can cause total behavioral change (ie. Heisenbug).
> For the record, we have been tracking for more than 2 months (first
> occurrences happened a year ago) an mbuf corruption in the network
> stack, present in all released code since at least FreeBSD 7[0]. Each
> time we think it is fixed, we are proven wrong by customers within a
> few days when the system crashes again. Even the last attempt which
> was believed to be bullet-proof failed and crashes daily.
> All that to say that production code should embed enough facilities to
> allow the system to be fully debugged with a runtime cost as low as
> possible, and a code-size cost as low as possible[1]. I should be able
> to connect on a production machine, turn a knob, an see what is going
> wrong, without the customer noticing.
> In the worst case, when you have to enable debug-only code, it must
> not be done by making the non-debug case more expensive, but wrap
> around. The whole original point of the patches was that LOCK_FILE and
> LOCK_LINE are a bad answer to a wrong problem.
> `__FILE__, __LINE__' and the bloat introduced is not the problem,
> `const char *file, int line' in way too much prototypes is.
> Now, you make me realize that `const char *file, int line' should just
> be removed, not replaced by `unsigned long' or anything else. It's
> likely to be done in another iteration.
>> just require you to be able to see the console. and have sources nearby.
>> if you need the IP use gdb.
> "console debugging" is yet another abomination which should be hunted
> down. Just try to do any useful work at high-pps on a serial
> console...
>> it's just what you are used to. You are obviously from the dark side
>> ^H^H^H^H^H^H linux.
> My obedience is totally irrelevant to the problem.
> However, if you want to know, my heart tends to be with BSDs.
> Unfortunately, it's a sad love-story where your Beloved keeps
> deceiving you day after day. You want to change small bits at a time,
> make several iteration of progress to make things brighter, but your
> Beloved refuses any change because of too much inertia. Sad.
>> so you are used to doing it that way.. but don't expect us to change just
>> because that's what Linux does.
> again, mentioning Linux is totally irrelevant. Use of Instruction
> Pointer are implementation details for a not so intrusive solution to
> the problem I pointed out, and which you are totally missing.
> Now, please answer this: do you find any of the bloat to the non-debug
> case (ie. passing a NULL pointer and a 0 integer, when `LOCK_DEBUG ==
> 0') worth the extra debugability comfort to be acceptable ?
> If you do, then your focus is on making things comfortable for
> developers, at the expense 100's of users, rather than making things
> comfortable for 100's of users, at the expense of developers.
>> When we have a problem at work on teh Linux driver, my first step is
>> always
>> to try duplicate it on FreeBSD because:
> well, you're lucky FreeBSD supports your device! Lately, we got lately
> a shiny multi-queue network cards with bypass mechanism... that is not
> supported in FreeBSD. So currently, we got an expensive paper-weight.
>> 1/ half the time freebsd will just immediatly assert on something and
>> present you with the bug.. done.
> well, certainly not from a release build; assertion are disabled.
>> 2/ I can run gdb through firewire on it on ANY standard unmodified kernel
>> and find it, where on Linux I need to
>> get a whole universe of stupid patches all aligned and MAYBE I might be
>> able
>> to see what is going on.
>> if it's on redhat I need to do this, on ubuntu that, on suse something
>> else
>> ,and on different revisions
>> of the kernel it all changes anyhow..
> machine (even x86-64 machines) I run FreeBSD on have no firewire,
> neither do my desktops, this limit the usability of the feature.
> Moreover, I do not use mass-distros either, except for desktop[3], but
> small embedded firmware (ala. openwrt), even on "middle-end" system,
> and stick to vanilla kernel (when not playing with -next).
>>> That said, IP address are barely used in FreeBSD, there is no legacy.
>>> As such, the API should not use `unsigned long' but `void *'[0]; this
>>> is the natural type returned by `__builtin_return_address()' and the
>>> `&&' operator. This would allow to introduce a modifier to `%p' to do
>>> the translation.
>> possibly intptr_t is what should be used. but I'd expect Bruce to drop in
>> here and let us us know.
> whatever will suit you :) I started by using `uintptr_t', but they
> cannot be printed in a portable manner and the printf(9) stuff was not
> ready yet.
>  - Arnaud
> [0]: I am able to crash any kernel between 7-STABLE to 9.STABLE within
> minutes, with the right pattern and (mainstream and well supported)
> hardware.

Maybe this bug related to sk(4) driver?

> [1]: for example, when I read in `sys/kern/kern_fail.c':
>  * Failpoints allow for injecting fake errors into running code on the fly,
>  * without modifying code or recompiling with flags.  Failpoints are always
>  * present, and are very efficient when disabled.
> and see that "very efficient when disabled" translates into a
> __predict_false() conditional, well, sorry, but I am very dubious.
> Entering the branch _is_ still a cost. A most efficient way of doing
> this would be using gcc's `asm goto' statement, in which case the hot
> path only ever see a `nop', eventually patched at runtime.
> [2]: soekris-like boards do not qualify for "embedded" :)
> [3]: FreeBSD (8-STABLE) is way to limited and un-integrated to be
> anywhere but useful, not to speak about kernel bug which leave the
> system so fracked up that you have no other choice but to hard-reboot.
> _______________________________________________
> freebsd-current at mailing list
> To unsubscribe, send any mail to "freebsd-current-unsubscribe at"

More information about the freebsd-current mailing list