Using Instruction Pointer address in debug interfaces [Was: Re: vm_page_t related KBI [Was: Re: panic at vm_page_wire with FreeBSD 9.0 Beta 3]]

Arnaud Lacombe lacombar at
Wed Nov 9 05:29:54 UTC 2011


On Tue, Nov 8, 2011 at 9:35 PM, Julian Elischer <julian at> wrote:
> On 11/8/11 5:52 PM, Arnaud Lacombe wrote:
>> Hi,
>> On Tue, Nov 8, 2011 at 7:08 PM, Julian Elischer<julian at>
>>  wrote:
>>> On 11/8/11 10:49 AM, Arnaud Lacombe wrote:
>>>> Hi,
>>>> To avoid future complaints about the fact that I would be only "talk"
>>>> without "action", I did implement what I suggested above. As it is
>>>> quite a large patch-set, I will not post it directly here, however, it
>>>> is available on github:
>>>> It convert a bunch of debug interface to use the caller instruction
>>>> pointer, as well as a proof-of-concept teaching printf(9) to convert
>>>> IP to symbol_name+offset.
>>>> It translates in a direct saving of about +250kB on i386's GENERIC,
>>>> just in kernel text size. Even the worst case, ie LOCK_DEBUG == 0,
>>>> translates to a save of +80kB.
>>>> Please note that this is still WIP code.
>>> A couple of comments.
>>> Firstly, the idea of a printf method to print the IP as symbol+offset is
>>> an
>>> interesting idea
>>> that should be followed up in its own right.
>> FWIW, I have no credit in this idea. It has been in Linux for ages and
>> ages.
> yeah as I said  at work I use linux and BSD...
> the linux stuff that just prints out IP really annoys me.
> the list stuff and netgraph debug (which should be off in any production
> system)
this is, I guess, where we do not agree. You find it acceptable to run
totally different code in production and during debug. I do not. This
is completely insane, even more nowadays where heavy parallelism
increases the likelihood of races, and subtle change in the code, even
optimization, can cause total behavioral change (ie. Heisenbug).

For the record, we have been tracking for more than 2 months (first
occurrences happened a year ago) an mbuf corruption in the network
stack, present in all released code since at least FreeBSD 7[0]. Each
time we think it is fixed, we are proven wrong by customers within a
few days when the system crashes again. Even the last attempt which
was believed to be bullet-proof failed and crashes daily.

All that to say that production code should embed enough facilities to
allow the system to be fully debugged with a runtime cost as low as
possible, and a code-size cost as low as possible[1]. I should be able
to connect on a production machine, turn a knob, an see what is going
wrong, without the customer noticing.

In the worst case, when you have to enable debug-only code, it must
not be done by making the non-debug case more expensive, but wrap
around. The whole original point of the patches was that LOCK_FILE and
LOCK_LINE are a bad answer to a wrong problem.

`__FILE__, __LINE__' and the bloat introduced is not the problem,
`const char *file, int line' in way too much prototypes is.

Now, you make me realize that `const char *file, int line' should just
be removed, not replaced by `unsigned long' or anything else. It's
likely to be done in another iteration.

> just require you to be able to see the console. and have sources nearby.
> if you need the IP use gdb.
"console debugging" is yet another abomination which should be hunted
down. Just try to do any useful work at high-pps on a serial

> it's just what you are used to. You are obviously from the dark side
> ^H^H^H^H^H^H linux.
My obedience is totally irrelevant to the problem.

However, if you want to know, my heart tends to be with BSDs.
Unfortunately, it's a sad love-story where your Beloved keeps
deceiving you day after day. You want to change small bits at a time,
make several iteration of progress to make things brighter, but your
Beloved refuses any change because of too much inertia. Sad.

> so you are used to doing it that way.. but don't expect us to change just
> because that's what Linux does.
again, mentioning Linux is totally irrelevant. Use of Instruction
Pointer are implementation details for a not so intrusive solution to
the problem I pointed out, and which you are totally missing.

Now, please answer this: do you find any of the bloat to the non-debug
case (ie. passing a NULL pointer and a 0 integer, when `LOCK_DEBUG ==
0') worth the extra debugability comfort to be acceptable ?

If you do, then your focus is on making things comfortable for
developers, at the expense 100's of users, rather than making things
comfortable for 100's of users, at the expense of developers.

> When we have a problem at work on teh Linux driver, my first step is always
> to try duplicate it on FreeBSD because:
well, you're lucky FreeBSD supports your device! Lately, we got lately
a shiny multi-queue network cards with bypass mechanism... that is not
supported in FreeBSD. So currently, we got an expensive paper-weight.

> 1/ half the time freebsd will just immediatly assert on something and
> present you with the bug.. done.
well, certainly not from a release build; assertion are disabled.

> 2/ I can run gdb through firewire on it on ANY standard unmodified kernel
> and find it, where on Linux I need to
> get a whole universe of stupid patches all aligned and MAYBE I might be able
> to see what is going on.
> if it's on redhat I need to do this, on ubuntu that, on suse something else
> ,and on different revisions
> of the kernel it all changes anyhow..
machine (even x86-64 machines) I run FreeBSD on have no firewire,
neither do my desktops, this limit the usability of the feature.
Moreover, I do not use mass-distros either, except for desktop[3], but
small embedded firmware (ala. openwrt), even on "middle-end" system,
and stick to vanilla kernel (when not playing with -next).

>> That said, IP address are barely used in FreeBSD, there is no legacy.
>> As such, the API should not use `unsigned long' but `void *'[0]; this
>> is the natural type returned by `__builtin_return_address()' and the
>> `&&' operator. This would allow to introduce a modifier to `%p' to do
>> the translation.
> possibly intptr_t is what should be used. but I'd expect Bruce to drop in
> here and let us us know.
whatever will suit you :) I started by using `uintptr_t', but they
cannot be printed in a portable manner and the printf(9) stuff was not
ready yet.

 - Arnaud

[0]: I am able to crash any kernel between 7-STABLE to 9.STABLE within
minutes, with the right pattern and (mainstream and well supported)

[1]: for example, when I read in `sys/kern/kern_fail.c':

 * Failpoints allow for injecting fake errors into running code on the fly,
 * without modifying code or recompiling with flags.  Failpoints are always
 * present, and are very efficient when disabled.

and see that "very efficient when disabled" translates into a
__predict_false() conditional, well, sorry, but I am very dubious.
Entering the branch _is_ still a cost. A most efficient way of doing
this would be using gcc's `asm goto' statement, in which case the hot
path only ever see a `nop', eventually patched at runtime.

[2]: soekris-like boards do not qualify for "embedded" :)

[3]: FreeBSD (8-STABLE) is way to limited and un-integrated to be
anywhere but useful, not to speak about kernel bug which leave the
system so fracked up that you have no other choice but to hard-reboot.

More information about the freebsd-current mailing list