svn commit: r310045 - head/sys/ddb

Bruce Evans brde at optusnet.com.au
Wed Dec 14 08:24:56 UTC 2016


On Tue, 13 Dec 2016, John Baldwin wrote:

> On Wednesday, December 14, 2016 12:18:12 AM John Baldwin wrote:
>>
>> Log:
>>   Use casts to force an unsigned comparison in db_search_symbol().
>>
>>   On all of our platforms, db_expr_t is a signed integer while
>>   db_addr_t is an unsigned integer value.  db_search_symbol used variables
>>   of type db_expr_t to hold the current offset of the requested address from
>>   the "best" symbol found so far.  This value was initialized to '~0'.
>>   When a new symbol is found from a symbol table, the associated diff for the
>>   new symbol is compared against the existing value as 'if (newdiff < diff)'
>>   to determine if the new symbol had a smaller diff and was thus a closer
>>   match.
>>
>>   On 64-bit MIPS, the '~0' was treated as a negative value (-1).  A lookup
>>   that found a perfect match of an address against a symbol returned a diff
>>   of 0.  However, in signed comparisons, 0 is not less than -1.  As a result,
>>   DDB on 64-bit MIPS never resolved any addresses to symbols.  Workaround
>>   this by using casts to force an unsigned comparison.
>
> I am somewhat unsure of why this worked on other architectures.  amd64
> treated ~0 as 0xffffffff which when assigned to a 64-bit register was

It is treated the same on all non-broken arches, and I doubt that MIPS64
is broken here.

~0 is -1 on all arches (since no supported arches are 1's complement),
but it was assigned to 'unsigned diff' which gives 0xffffffff on all
supported arches.  Then diff is assigned to 'size_t newdiff', giving
no change in type or value on 32-bit arches, and zero-extension to the
same value but larger type on 64-bit arches.

Note that the type of diff is very broken.  It is unsigned.  This corrupts
not only ~0, but also the final difference returned in *offp.  The return
value also gets corrupted from 32 bits unsigned to 32 bits signed on
32-bit arches, so offsets >= 2**31 are fragile on all arches.  I think
most cases work because the offset is 0.

I don't see how initializing to 'val' can help.  It still gets corrupted
to unsigned, and I would expect it to make the problem much worse because
the truncation gives an even smaller value.  E.g., if val is any multiple
of 2**32, truncating it gives 0.  Without the truncation, 'val' is a good
upper bound for the offset.

Perhaps the problem is more with the initialization of newdiff.  It is
initialized to the corrupted value in diff.  I don't know the details of
the API, but if X_db_search_symbol() uses this as an input parameter to
limit the search, it might be happier with the smaller corrupted value
than the larger one.  X_db_search_symbol() seems to have working types
and values, though still too much hard-coding of types.

Your casts to uintmax_t have no effect.  The types are already unsigned,
and the corrupted values cannot be fixed by later casts.

Other type errors:
- newdiff has type size_t.  This bug is mostly cosmetic, but the type
   doesn't even match the X_db_search_symbol() API where newdiff is
   initialized.
- db_expr_t hasn't caught up with C99 yet.  It should have type intmax_t.
   Even cpp uses intmax_t for integer expressions.  C99 broke longs, so
   simply changing db_expr_t to intmax_t would break all the printf's that
   convert db_expr_t to long or u_long for printing.  There are also hard-
   coded assumptions that db_expr_t == intptr_t.

> zero-extended.  i386 also used 0xffffffff, but it used an unsigned comparison
> (jae instead of jge).
> 
> The kernel linker API returns an unsigned long for
> the diff, so I do think using db_addr_t for this type is probably the right
> solution in the long term.
> 
>>   Probably the diff returned from db_search_symbol() and X_db_search_symbol()
>>   should be changed to a db_addr_t instead of a db_expr_t as it is an
>>   unsigned value (and is an offset of an address, so should fit in the same
>>   size as an address).

I prefer signed types, and ddb already does this like I prefer.
Sometimes db_search_symbol() returns negative values.  This is most
needed for absolute symbols.  See db_printsym(), where there are many
buggy attempts to determine if the a negative symbol offset is actually
negative or just a representation of a huge unsigned value (usually a
kernel address), for the purpose of printing things like -8(%rbp)
instead of the current spam 0xfffffffffffffff8(%rbp).  Apparently,
offsets from registers are represented using special symbols, and for
the frame and stack pointers these offsets are usually negative, but
the extra bit needed for encoding this is not available/implemented
so it is difficult for db_printsym() to determine the actual offset.

I fixed printing of negative offsets from the frame pointer on i386,
but this is currently broken on i386 and never worked on other arches.
DB_LARGE_VALUE_MIN has the old i386 value on all arches, but this is
now wrong for i386 and never worked on other arches.  This is related
to the current problem: I think offsets like -8 are looked up as
symbols, and found to be at the largest kernel symbol plus some offset.
DB_LARGE_VALUE_MIN is supposed to give essentially the address of this
symbol.  This address was very high on i386 (given by static allocation
of some page table).  Now the highest symbol is not far above KERNBASE,
so is unusable for this purpose.  To fix this, all arches should have
a dummy symbol near the top of the address space.  It must be found
when an offset like -8 is looked up, and should be just a few hundred
K below the top since negative offsets larger than a few hundred K
are likely to be garbage.

> Also, in case it wasn't clear, this fixes resolution of addresses to names
> in MIPS64 stack traces in DDB as well as when using 'x', etc.

Bruce


More information about the svn-src-head mailing list