Re: dtrace bitfields failure (was: 12.3-RC1 fails ...)

From: Peter <pmc_at_citylink.dinoex.sub.org>
Date: Tue, 04 Jan 2022 21:58:13 UTC
On Tue, Jan 04, 2022 at 01:01:55PM -0500, Mark Johnston wrote:
! On Tue, Jan 04, 2022 at 04:05:53PM +0100, Peter wrote:
! > 
! > Hija,
! > 
! >   sadly, I was too early in agreeing that the two patches
! >      22082f15f9
! >      68396709e7
! > together do solve the issue. They only do on a certain assumption,
! > which does not hold true in all cases.
! > 
! > 
! > Let's look at https://reviews.freebsd.org/D27213
! > 
! > This is the code in question that will trigger the action:
! > 
! >      if (dst_type == CTF_ERR && name[0] != '\0' &&
! >              (hep = ctf_hash_lookup(&src_fp->ctf_names, src_fp, name,
! >              strlen(name))) != NULL &&
! >              src_type != (ctf_id_t)hep->h_type) {
! > 
! > What happens here: in the case of a bitfield type we need to also
! > copy the corresponding intrinsic type. This condition here checks for
! > the case and also should deliver that respective intrinsic type
! > into the "hep" variable.
! > 
! > But this depends on the assumption that the intrinsic type appears
! > first in the "src_fp" container, so that the hash will point to it.
! > And that is not necessarily true; it depends on what options you have
! > in your kernel config.
! > 
! > 
! > For instance, with my custom kernel, things look like this:
! > 
! > $ ctfdump -t kernel.full
! > 
! > - Types ----------------------------------------------------------------------
! > 
! >   [1] STRUCT (anon) (8 bytes)
! >         sle_next type=262 off=0
! > 
! >   [2] STRUCT (anon) (8 bytes)
! >         stqe_next type=262 off=0
! > 
! >   [3] UNION (anon) (8 bytes)
! >         m_next type=262 off=0
! >         m_slist type=1 off=0
! >         m_stailq type=2 off=0
! > 
! >   [4] UNION (anon) (8 bytes)
! >         m_nextpkt type=262 off=0
! >         m_slistpkt type=1 off=0
! >         m_stailqpkt type=2 off=0
! > 
! >   <5> INTEGER char encoding=SIGNED CHAR offset=0 bits=8
! >   <6> POINTER (anon) refers to 5
! >   <7> TYPEDEF caddr_t refers to 6
! >   <8> INTEGER int encoding=SIGNED offset=0 bits=32
! >   <9> TYPEDEF __int32_t refers to 8
! >   <10> TYPEDEF int32_t refers to 9
! >   [11] INTEGER unsigned int encoding=0x0 offset=0 bits=8
! >   [12] INTEGER unsigned int encoding=0x0 offset=0 bits=24
! >   [13] STRUCT (anon) (8 bytes)
! >         cstqe_next type=229 off=0
! > 
! >   <14> POINTER (anon) refers to 229
! >   [15] STRUCT (anon) (16 bytes)
! >         le_next type=229 off=0
! >         le_prev type=14 off=64
! > 
! >   <16> INTEGER long encoding=SIGNED offset=0 bits=64
! >   <17> ARRAY (anon) content: 5 index: 16 nelems: 16
! > 
! >   <18> INTEGER unsigned int encoding=0x0 offset=0 bits=32
! >   <19> TYPEDEF u_int refers to 18
! > [etc.etc.]
! > 
! > 
! > As we can see, this one has the bitfield types as #11 and #12, and
! > the intrinsic type as #18. And consequentially things do fail.
! > 
! > 
! > I currently do not know what is the culprit. Has the linking stage of
! > the kernel a flaw? Or is the patch D27213 based on a wrong assumption?
! > 
! > I hope You guys can answer that. For now I changed the patch D27213
! > to cover the case, so that things do work.
! > Further details on request.
! 
! I'm not immediately sure where the problem is.  Could you please post
! the kernel configuration and src revision that you're using, so that I
! can try and reproduce this?

Oh, I feared that would come...
Src revision is easy now: release/12.3.0 (70cb68e7a00)

Kernel config is difficult. I have compiled into the kernel
 * ipfw (obviousely)
 * dtraceall 
 * drm2 & friends (that needs objects to be added to conf/files)
 * khelp/h_ertt/etc. (that needs the files and fixing the SI_SUB
   sequence to make it boot)
So the kernel config itself doesn't help to reproduce. 


What I am currently looking for is only an educated statement, about
if that types sequence (as quoted above) can possibly happen, or, should
never happen at all.
If it should not happen, then it's my fault and I might go and look why
it happens.

! How exactly does the bug manifest?

Exactly as is to be expected, with either of these two errors
(depending on the native order of files in /usr/lib/dtrace);

[1]  dtrace: failed to establish error handler: "/usr/lib/dtrace/ipfw.d",
  line 107: failed to copy type of 'inp': Conflicting type is already
  defined
[2]  dtrace: failed to establish error handler:
  "/usr/lib/dtrace/psinfo.d", line 41: failed to copy type of
  'pr_gid': Conflicting type is already defined 


Then I single-stepped the libctf and it clearly showed the mismatch
between type #11 and type #18 (and the patch 68396709e7 one time doing
things where it shouldn't and the other time not doing things where
it should).

So I am probably on track with understanding what happens, nevertheless
I would greatly appreciate some input from You how it *is supposed to*
work.


cheerio,
PMc