crash of 32-bit powerpc -r347549 kernel built via system-clang-8 (crash is while trying to mount the root file system) [debug kernel case: code generation error] [I was wrong]

Mark Millard marklmi at yahoo.com
Thu Jun 6 06:30:00 UTC 2019


[I misanalysed the code. Sorry for the noise.]

On 2019-Jun-5, at 14:17, Mark Millard <marklmi at yahoo.com> wrote:

> [This is from my experiments with more modern toolchains than
> normally/offocially used, specifically for 32-bit powerpc this
> time.]
> 
> On 2019-Jun-5, at 01:35, Mark Millard <marklmi at yahoo.com> wrote:
> 
>> On 2019-Jun-3, at 19:40, Mark Millard <marklmi at yahoo.com> wrote:
>> 
>>> On 2019-Jun-3, at 17:24, Mark Millard <marklmi at yahoo.com> wrote:
>>> 
>>>> I tried (cross) building a 32-bit powerpc kernel and world (non-debug) 
>>>> with system-clang (on amd64) and use of devel/powerpc64-binutils . The
>>>> installed kernel panics trying to mount the root file system.
>>>> 
>>>> FYI: Typed from picture of screen . . .
>>>> 
>>>> Trying to mount root from ufs:/dev/ufs/FBSDG4Srootfs [rw,noatime]...
>>>> panic: getnewbuf_empty: Locked buf 0xd2800000 on free queue.
>>>> . . .
>>>> 0xd6919080: at kdb_backtrace+0x64
>>>> 0xd69190e0: at vpanic+0x200
>>>> 0xd6919150: at panic+0x50
>>>> 0xd6919190: at getnewbuf+0x594
>>>> 0xd69191f0: at getblkx+0x540
>>>> 0xd69192a0: at breadn_flags+0x90
>>>> 0xd69192f0: at ffs_use_bread+0x9c
>>>> 0xd6919330: at readsuper+0x68
>>>> 0xd6919370: at ffs_sbget+0xcc
>>>> 0xd69193c0: at ffs_mount+0x18b8
>>>> 0xd69194f0: at vfs_domount+0xa74
>>>> 0xd69196a0: at vfs_donmount+0x944
>>>> 0xd6919700: at kernel_mount+0x64
>>>> 0xd6919740: at parse_mount+0x52c
>>>> 0xd6919840: at vfs_mountroot+0x71c
>>>> 0xd69199b0: at start_init+0x44
>>>> 0xd6919a10: at fork_exit_0xcc
>>>> 0xd6919a40: at fork_trampoline+0xc
>>>> KDB: enter panic
>>>> [ thread pid 1 tid 100002 ]
>>>> Stopped at kdb_enter+0x74: addi r3,r0,0x0
>>>> 
>>>> This reproduces with each boot attempt.
>>>> 
>>>> Replacing the kernel with one built via gcc 4.2.1 and booting
>>>> the result does not panic.
>>>> 
>>>> 
>>>> FYI for the context of the panic call:
>>>> 
>>>> /usr/src/sys/kern/vfs_bio.c :
>>>> 
>>>> static struct buf *
>>>> buf_alloc(struct bufdomain *bd)
>>>> {
>>>>     struct buf *bp;
>>>>     int freebufs;
>>>> 
>>>>     /*
>>>>      * We can only run out of bufs in the buf zone if the average buf
>>>>      * is less than BKVASIZE.  In this case the actual wait/block will
>>>>      * come from buf_reycle() failing to flush one of these small bufs.
>>>>      */
>>>>     bp = NULL;
>>>>     freebufs = atomic_fetchadd_int(&bd->bd_freebuffers, -1);
>>>>     if (freebufs > 0)
>>>>             bp = uma_zalloc(buf_zone, M_NOWAIT);
>>>>     if (bp == NULL) {
>>>>             atomic_add_int(&bd->bd_freebuffers, 1);
>>>>             bufspace_daemon_wakeup(bd);
>>>>             counter_u64_add(numbufallocfails, 1);
>>>>             return (NULL);
>>>>     }
>>>>     /*
>>>>      * Wake-up the bufspace daemon on transition below threshold.
>>>>      */
>>>>     if (freebufs == bd->bd_lofreebuffers)
>>>>             bufspace_daemon_wakeup(bd);
>>>> 
>>>>     if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL) != 0)
>>>>             panic("getnewbuf_empty: Locked buf %p on free queue.", bp);
>>> 
>>> 
>>> I tried making a debug kernel build via system-clang-8. It
>>> reports differently but still during getnewbuf being active
>>> on the stack (again typed from a picture):
>>> 
>>> Trying to mount root from ufs:/dev/ufs/FBSDG4Srootfs [rw,noatime]...
>>> . . . (ignore witness/diagnostic warnings) . . .
>>> panic: bq_remove: Locked buf 0xd2a00000 not on a queue.
>>> . . .
>>> 0xd6b7bfd0: at kdb_backtrace+0x64
>>> 0xd6b7c030: at vpanic+0x200
>>> 0xd6b7c0a0: at panic+0x50
>>> 0xd6b7c0e0: at bq_remove+01e0
>>> 0xd6b7c100: at buf_import+0x8c
>>> 0xd6b7c130: at uma_zalloc_arg+0x544
>>> 0xd6b7c190: at getnewbuf+0x380
>>> 0xd6b7c1f0: at getblkx+0x620
>>> 0xd6b7c290: at breadn_flags+0x90
>>> 0xd6b7c2e0: at ffs_use_bread+0xa8
>>> 0xd6b7c320: at readsuper+0x68
>>> 0xd6b7c360: at ffs_sbget+0xcc
>>> 0xd6b7c3b0: at ffs_mount+0xefc
>>> 0xd6b7c4e0: at vfs_domount+0xa754
>>> 0xd6b7c690: at vfs_donmount+0x78c
>>> 0xd6b7c6f0: at kernel_mount+0x7c
>>> 0xd6b7c730: at parse_mount+0x52c
>>> 0xd6b7c830: at vfs_mountroot+0x660
>>> 0xd6b7c9a0: at start_init+0x4c
>>> 0xd6b7ca10: at fork_exit_0xb0
>>> 0xd6b7ca40: at fork_trampoline+0xc
>>> 
>>> /usr/src/sys/kern/vfs_bio.c :
>>> 
>>> static void
>>> bq_remove(struct bufqueue *bq, struct buf *bp)
>>> {
>>> 
>>>      CTR3(KTR_BUF, "bq_remove(%p) vp %p flags %X",
>>>          bp, bp->b_vp, bp->b_flags);
>>>      KASSERT(bp->b_qindex != QUEUE_NONE,
>>>          ("bq_remove: buffer %p not on a queue.", bp));
>>> . . .
>>> 
>>> For reference:
>>> 
>>> static int
>>> buf_import(void *arg, void **store, int cnt, int domain, int flags)
>>> {
>>>      struct buf *bp;
>>>      int i;
>>> 
>>>      BQ_LOCK(&bqempty);
>>>      for (i = 0; i < cnt; i++) {
>>>              bp = TAILQ_FIRST(&bqempty.bq_queue);
>>>              if (bp == NULL)
>>>                      break;
>>>              bq_remove(&bqempty, bp);
>>>              store[i] = bp;
>>>      }
>>>      BQ_UNLOCK(&bqempty);
>>> 
>>>      return (i);
>>> }
>>> 
>>> 
>> 
>> I tried building the debug kernel with KTR for KTR_BUF.
>> Installing and booting the result did not panic. Manually
>> forcing getting to ddb> soon enough and doing "show ktr"
>> did show a bq_remove for 0xd2a00000 (and later activity).
>> 
>> From the looks of the KTR_BUF CTRn's, this suggests to me
>> that the access to bp->qindex in bq_remove is racy in
>> some way vs. updates to the value.
> 
> The code produced by clang for the debug kernel, KTR
> off in this case, for:
> 
>      KASSERT(bp->b_qindex != QUEUE_NONE,
>          ("bq_remove: buffer %p not on a queue.", bp));
> 
> is wrong [the 84(r29) accesses bp->b_qindex]:
> 
> . . .
> 00618aa8 <bq_remove+0x34> lbz     r5,84(r29)
> 00618aac <bq_remove+0x38> cmplwi  r5,4
> 00618ab0 <bq_remove+0x3c> bgt-    00618c10 <bq_remove+0x19c>
> . . .
> 00618c10 <bq_remove+0x19c> lwz     r3,-32364(r30)
> 00618c14 <bq_remove+0x1a0> crclr   4*cr1+eq
> 00618c18 <bq_remove+0x1a4> mr      r4,r29
> 00618c1c <bq_remove+0x1a8> bl      00541ca0 <panic>
> . . .
> 
> Comparing against 4 does not match any part of
> bq_remove. Comparison via gt would make sense for:

Wrong.

The 4 and gt use comes from inlining bufqueue(bp)
in the following KASSERT. For reference (from the .i):

bufqueue(struct buf *bp)
{

 switch (bp->b_qindex) {
 case 0:

 case 4:
  return (((void *)0));
 case 1:
  return (&bqempty);
 case 2:
  return (&bufdomain(bp)->bd_dirtyq);
 case 3:
  return (&bufdomain(bp)->bd_subq[bp->b_subqueue]);
 default:
  break;
 }
 panic("bufqueue(%p): Unhandled type %d\n", bp, bp->b_qindex);
}

The code generation put the first KASSERT's related panic in
to case 0 above when bufqueue was inlined.

> /usr/src/sys/sys/buf.h: uint8_t         b_qindex;       /* (Q) buffer queue index */)
> 
> if the comparison was against zero. It should
> have been:
> 
> /usr/src/sys/kern/vfs_bio.c:#define QUEUE_NONE  0       /* on no queue */
> 
> 
> This is for a head -r347549 32-bit powerpc FreeBSD context,
> built with system clang (an amd6->powerpc cross build using
> devel/powerpc64-binutils ).



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)



More information about the freebsd-toolchain mailing list