Re: dtrace bitfields failure (was: 12.3-RC1 fails ...)
- In reply to: Mark Johnston : "Re: dtrace bitfields failure (was: 12.3-RC1 fails ...)"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 05 Jan 2022 20:00:51 UTC
On Tue, Jan 04, 2022 at 05:58:13PM -0500, Mark Johnston wrote:
! On Tue, Jan 04, 2022 at 10:58:13PM +0100, Peter wrote:
! > On Tue, Jan 04, 2022 at 01:01:55PM -0500, Mark Johnston wrote:
! > ! On Tue, Jan 04, 2022 at 04:05:53PM +0100, Peter wrote:
! > ! >
! > ! > Hija,
! > ! >
! > ! > sadly, I was too early in agreeing that the two patches
! > ! > 22082f15f9
! > ! > 68396709e7
! > ! > together do solve the issue. They only do on a certain assumption,
! > ! > which does not hold true in all cases.
! > ! >
! > ! >
! > ! > Let's look at https://reviews.freebsd.org/D27213
! > ! >
! > ! > This is the code in question that will trigger the action:
! > ! >
! > ! > if (dst_type == CTF_ERR && name[0] != '\0' &&
! > ! > (hep = ctf_hash_lookup(&src_fp->ctf_names, src_fp, name,
! > ! > strlen(name))) != NULL &&
! > ! > src_type != (ctf_id_t)hep->h_type) {
! > ! >
! > ! > What happens here: in the case of a bitfield type we need to also
! > ! > copy the corresponding intrinsic type. This condition here checks for
! > ! > the case and also should deliver that respective intrinsic type
! > ! > into the "hep" variable.
! > ! >
! > ! > But this depends on the assumption that the intrinsic type appears
! > ! > first in the "src_fp" container, so that the hash will point to it.
! > ! > And that is not necessarily true; it depends on what options you have
! > ! > in your kernel config.
! > ! >
! > ! >
! > ! > For instance, with my custom kernel, things look like this:
! > ! >
! > ! > $ ctfdump -t kernel.full
! > ! >
! > ! > - Types ----------------------------------------------------------------------
! > ! >
! > ! > [1] STRUCT (anon) (8 bytes)
! > ! > sle_next type=262 off=0
! > ! >
! > ! > [2] STRUCT (anon) (8 bytes)
! > ! > stqe_next type=262 off=0
! > ! >
! > ! > [3] UNION (anon) (8 bytes)
! > ! > m_next type=262 off=0
! > ! > m_slist type=1 off=0
! > ! > m_stailq type=2 off=0
! > ! >
! > ! > [4] UNION (anon) (8 bytes)
! > ! > m_nextpkt type=262 off=0
! > ! > m_slistpkt type=1 off=0
! > ! > m_stailqpkt type=2 off=0
! > ! >
! > ! > <5> INTEGER char encoding=SIGNED CHAR offset=0 bits=8
! > ! > <6> POINTER (anon) refers to 5
! > ! > <7> TYPEDEF caddr_t refers to 6
! > ! > <8> INTEGER int encoding=SIGNED offset=0 bits=32
! > ! > <9> TYPEDEF __int32_t refers to 8
! > ! > <10> TYPEDEF int32_t refers to 9
! > ! > [11] INTEGER unsigned int encoding=0x0 offset=0 bits=8
! > ! > [12] INTEGER unsigned int encoding=0x0 offset=0 bits=24
! > ! > [13] STRUCT (anon) (8 bytes)
! > ! > cstqe_next type=229 off=0
! > ! >
! > ! > <14> POINTER (anon) refers to 229
! > ! > [15] STRUCT (anon) (16 bytes)
! > ! > le_next type=229 off=0
! > ! > le_prev type=14 off=64
! > ! >
! > ! > <16> INTEGER long encoding=SIGNED offset=0 bits=64
! > ! > <17> ARRAY (anon) content: 5 index: 16 nelems: 16
! > ! >
! > ! > <18> INTEGER unsigned int encoding=0x0 offset=0 bits=32
! > ! > <19> TYPEDEF u_int refers to 18
! > ! > [etc.etc.]
! > ! >
! > ! >
! > ! > As we can see, this one has the bitfield types as #11 and #12, and
! > ! > the intrinsic type as #18. And consequentially things do fail.
! > ! >
! > ! >
! > ! > I currently do not know what is the culprit. Has the linking stage of
! > ! > the kernel a flaw? Or is the patch D27213 based on a wrong assumption?
! > ! >
! > ! > I hope You guys can answer that. For now I changed the patch D27213
! > ! > to cover the case, so that things do work.
! > ! > Further details on request.
! > !
! > ! I'm not immediately sure where the problem is. Could you please post
! > ! the kernel configuration and src revision that you're using, so that I
! > ! can try and reproduce this?
! >
! > Oh, I feared that would come...
! > Src revision is easy now: release/12.3.0 (70cb68e7a00)
! >
! > Kernel config is difficult. I have compiled into the kernel
! > * ipfw (obviousely)
! > * dtraceall
! > * drm2 & friends (that needs objects to be added to conf/files)
! > * khelp/h_ertt/etc. (that needs the files and fixing the SI_SUB
! > sequence to make it boot)
! > So the kernel config itself doesn't help to reproduce.
!
! Can you show output of "ctfdump -S /path/to/your/kernel"? Though you're
! on a fairly old revision, with this set of extra modules linked into the
! kernel you might be overflowing CTFv2's limit of 2^15 distinct type
! definitions. That is not the proximate cause of the problem, which as
! you identified is that a bitfield type is appearing before the
! corresponding intrinsic, but it might be the root cause if a type ID
! overflow is causing ctfmerge to emit types in the wrong order.
# ctfdump -S kernel.full
- CTF Statistics -------------------------------------------------------------
total number of data objects = 27754
total number of functions = 27789
total number of function arguments = 66565
maximum argument list length = 22
average argument list length = 2.40
total number of types = 23838
total number of integers = 60
total number of floats = 1
total number of pointers = 6618
total number of arrays = 2079
total number of func types = 1910
total number of structs = 7474
total number of unions = 362
total number of enums = 622
total number of forward tags = 38
total number of typedefs = 3929
total number of volatile types = 44
total number of const types = 551
total number of restrict types = 0
total number of unknowns (holes) = 150
total number of struct members = 50431
maximum number of struct members = 248
total size of all structs = 6834343
maximum size of a struct = 1593440
average number of struct members = 6.75
average size of a struct = 914.42
total number of union members = 1234
maximum number of union members = 36
total size of all unions = 39472
maximum size of a union = 8208
average number of union members = 3.41
average size of a union = 109.04
total number of enum members = 6380
maximum number of enum members = 1023
average number of enum members = 10.26
total number of unique strings = 47418
bytes of string data = 630726
maximum string length = 69
average string length = 13.30
! > What I am currently looking for is only an educated statement, about
! > if that types sequence (as quoted above) can possibly happen, or, should
! > never happen at all.
! > If it should not happen, then it's my fault and I might go and look why
! > it happens.
! >
! > ! How exactly does the bug manifest?
! >
! > Exactly as is to be expected, with either of these two errors
! > (depending on the native order of files in /usr/lib/dtrace);
! >
! > [1] dtrace: failed to establish error handler: "/usr/lib/dtrace/ipfw.d",
! > line 107: failed to copy type of 'inp': Conflicting type is already
! > defined
! > [2] dtrace: failed to establish error handler:
! > "/usr/lib/dtrace/psinfo.d", line 41: failed to copy type of
! > 'pr_gid': Conflicting type is already defined
! >
! >
! > Then I single-stepped the libctf and it clearly showed the mismatch
! > between type #11 and type #18 (and the patch 68396709e7 one time doing
! > things where it shouldn't and the other time not doing things where
! > it should).
! >
! > So I am probably on track with understanding what happens, nevertheless
! > I would greatly appreciate some input from You how it *is supposed to*
! > work.
!
! Reading libctf's init_types(), it seems pretty clear that the comment
! added in D27213 is true: the second pass over the type graph inserts
! definitions into various hash tables, and for integer types only the
! first instance of a type with a particular name is inserted. In your
! case this means that a lookup by name of "unsigned int" will return a
! bitfield type.
That is my understanding so far, as well. So then, when I had figured
that the hash lookup delivers the wrong type, I just removed the hash
lookup, and instead search through the types sequentially. That seems
to work. [3], see below
! I can't immediately see how exactly ctfconvert/ctfmerge ensure that
! bitfields are ordered after intrinsic types, if they really do at all.
! It would be interesting to try running ctfconvert on each object file
! for your kernel to see if the ordering you showed above exists in a
! specific object file. If not, then I think we should check for a type
! ID overflow, then look more closely at ctfmerge.
Lets do it, that will be fun. Because, I tried already by removing
files from the kernel config. The funny thing is, if I remove any
one of (dtraceall, drm2, khelp/h_errt), the problem goes away. Also, if
I remove any one of the h_ertt files (cc_vegas, cc_cdg), the problem
goes away. And, if I *add* things to the kernel config, it also goes
away.
Conclusion: it seems impossible to pinpoint a single file as the cause.
At that point I gave up and started to singlestep the libctf instead.
So, if You're curious and want to figure it out, here is a git-am patch,
that should apply right on top of release/12.3.0 (70cb68e7a00).
It should create a sys/amd64/conf/D6R12V1 kernelconfig and all my
sourcetree modifications, and should cleanly build the respective
kernel. [3] is included, so the flaw is already covered.
http://oper.dinoex.de/.well-known/acme-challenge/patch-for-mark-johnston.patch
(That one should be accessible)
concerning [3]: this is just an utterly non-optimized proof-of-concept
patch.
cheerio,
PMc