crash of 32-bit powerpc -r347549 kernel built via system-clang-8 (crash is while trying to mount the root file system) [debug kernel case: code generation error] [I was wrong]
Mark Millard
marklmi at yahoo.com
Thu Jun 6 06:30:00 UTC 2019
[I misanalysed the code. Sorry for the noise.]
On 2019-Jun-5, at 14:17, Mark Millard <marklmi at yahoo.com> wrote:
> [This is from my experiments with more modern toolchains than
> normally/offocially used, specifically for 32-bit powerpc this
> time.]
>
> On 2019-Jun-5, at 01:35, Mark Millard <marklmi at yahoo.com> wrote:
>
>> On 2019-Jun-3, at 19:40, Mark Millard <marklmi at yahoo.com> wrote:
>>
>>> On 2019-Jun-3, at 17:24, Mark Millard <marklmi at yahoo.com> wrote:
>>>
>>>> I tried (cross) building a 32-bit powerpc kernel and world (non-debug)
>>>> with system-clang (on amd64) and use of devel/powerpc64-binutils . The
>>>> installed kernel panics trying to mount the root file system.
>>>>
>>>> FYI: Typed from picture of screen . . .
>>>>
>>>> Trying to mount root from ufs:/dev/ufs/FBSDG4Srootfs [rw,noatime]...
>>>> panic: getnewbuf_empty: Locked buf 0xd2800000 on free queue.
>>>> . . .
>>>> 0xd6919080: at kdb_backtrace+0x64
>>>> 0xd69190e0: at vpanic+0x200
>>>> 0xd6919150: at panic+0x50
>>>> 0xd6919190: at getnewbuf+0x594
>>>> 0xd69191f0: at getblkx+0x540
>>>> 0xd69192a0: at breadn_flags+0x90
>>>> 0xd69192f0: at ffs_use_bread+0x9c
>>>> 0xd6919330: at readsuper+0x68
>>>> 0xd6919370: at ffs_sbget+0xcc
>>>> 0xd69193c0: at ffs_mount+0x18b8
>>>> 0xd69194f0: at vfs_domount+0xa74
>>>> 0xd69196a0: at vfs_donmount+0x944
>>>> 0xd6919700: at kernel_mount+0x64
>>>> 0xd6919740: at parse_mount+0x52c
>>>> 0xd6919840: at vfs_mountroot+0x71c
>>>> 0xd69199b0: at start_init+0x44
>>>> 0xd6919a10: at fork_exit_0xcc
>>>> 0xd6919a40: at fork_trampoline+0xc
>>>> KDB: enter panic
>>>> [ thread pid 1 tid 100002 ]
>>>> Stopped at kdb_enter+0x74: addi r3,r0,0x0
>>>>
>>>> This reproduces with each boot attempt.
>>>>
>>>> Replacing the kernel with one built via gcc 4.2.1 and booting
>>>> the result does not panic.
>>>>
>>>>
>>>> FYI for the context of the panic call:
>>>>
>>>> /usr/src/sys/kern/vfs_bio.c :
>>>>
>>>> static struct buf *
>>>> buf_alloc(struct bufdomain *bd)
>>>> {
>>>> struct buf *bp;
>>>> int freebufs;
>>>>
>>>> /*
>>>> * We can only run out of bufs in the buf zone if the average buf
>>>> * is less than BKVASIZE. In this case the actual wait/block will
>>>> * come from buf_reycle() failing to flush one of these small bufs.
>>>> */
>>>> bp = NULL;
>>>> freebufs = atomic_fetchadd_int(&bd->bd_freebuffers, -1);
>>>> if (freebufs > 0)
>>>> bp = uma_zalloc(buf_zone, M_NOWAIT);
>>>> if (bp == NULL) {
>>>> atomic_add_int(&bd->bd_freebuffers, 1);
>>>> bufspace_daemon_wakeup(bd);
>>>> counter_u64_add(numbufallocfails, 1);
>>>> return (NULL);
>>>> }
>>>> /*
>>>> * Wake-up the bufspace daemon on transition below threshold.
>>>> */
>>>> if (freebufs == bd->bd_lofreebuffers)
>>>> bufspace_daemon_wakeup(bd);
>>>>
>>>> if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL) != 0)
>>>> panic("getnewbuf_empty: Locked buf %p on free queue.", bp);
>>>
>>>
>>> I tried making a debug kernel build via system-clang-8. It
>>> reports differently but still during getnewbuf being active
>>> on the stack (again typed from a picture):
>>>
>>> Trying to mount root from ufs:/dev/ufs/FBSDG4Srootfs [rw,noatime]...
>>> . . . (ignore witness/diagnostic warnings) . . .
>>> panic: bq_remove: Locked buf 0xd2a00000 not on a queue.
>>> . . .
>>> 0xd6b7bfd0: at kdb_backtrace+0x64
>>> 0xd6b7c030: at vpanic+0x200
>>> 0xd6b7c0a0: at panic+0x50
>>> 0xd6b7c0e0: at bq_remove+01e0
>>> 0xd6b7c100: at buf_import+0x8c
>>> 0xd6b7c130: at uma_zalloc_arg+0x544
>>> 0xd6b7c190: at getnewbuf+0x380
>>> 0xd6b7c1f0: at getblkx+0x620
>>> 0xd6b7c290: at breadn_flags+0x90
>>> 0xd6b7c2e0: at ffs_use_bread+0xa8
>>> 0xd6b7c320: at readsuper+0x68
>>> 0xd6b7c360: at ffs_sbget+0xcc
>>> 0xd6b7c3b0: at ffs_mount+0xefc
>>> 0xd6b7c4e0: at vfs_domount+0xa754
>>> 0xd6b7c690: at vfs_donmount+0x78c
>>> 0xd6b7c6f0: at kernel_mount+0x7c
>>> 0xd6b7c730: at parse_mount+0x52c
>>> 0xd6b7c830: at vfs_mountroot+0x660
>>> 0xd6b7c9a0: at start_init+0x4c
>>> 0xd6b7ca10: at fork_exit_0xb0
>>> 0xd6b7ca40: at fork_trampoline+0xc
>>>
>>> /usr/src/sys/kern/vfs_bio.c :
>>>
>>> static void
>>> bq_remove(struct bufqueue *bq, struct buf *bp)
>>> {
>>>
>>> CTR3(KTR_BUF, "bq_remove(%p) vp %p flags %X",
>>> bp, bp->b_vp, bp->b_flags);
>>> KASSERT(bp->b_qindex != QUEUE_NONE,
>>> ("bq_remove: buffer %p not on a queue.", bp));
>>> . . .
>>>
>>> For reference:
>>>
>>> static int
>>> buf_import(void *arg, void **store, int cnt, int domain, int flags)
>>> {
>>> struct buf *bp;
>>> int i;
>>>
>>> BQ_LOCK(&bqempty);
>>> for (i = 0; i < cnt; i++) {
>>> bp = TAILQ_FIRST(&bqempty.bq_queue);
>>> if (bp == NULL)
>>> break;
>>> bq_remove(&bqempty, bp);
>>> store[i] = bp;
>>> }
>>> BQ_UNLOCK(&bqempty);
>>>
>>> return (i);
>>> }
>>>
>>>
>>
>> I tried building the debug kernel with KTR for KTR_BUF.
>> Installing and booting the result did not panic. Manually
>> forcing getting to ddb> soon enough and doing "show ktr"
>> did show a bq_remove for 0xd2a00000 (and later activity).
>>
>> From the looks of the KTR_BUF CTRn's, this suggests to me
>> that the access to bp->qindex in bq_remove is racy in
>> some way vs. updates to the value.
>
> The code produced by clang for the debug kernel, KTR
> off in this case, for:
>
> KASSERT(bp->b_qindex != QUEUE_NONE,
> ("bq_remove: buffer %p not on a queue.", bp));
>
> is wrong [the 84(r29) accesses bp->b_qindex]:
>
> . . .
> 00618aa8 <bq_remove+0x34> lbz r5,84(r29)
> 00618aac <bq_remove+0x38> cmplwi r5,4
> 00618ab0 <bq_remove+0x3c> bgt- 00618c10 <bq_remove+0x19c>
> . . .
> 00618c10 <bq_remove+0x19c> lwz r3,-32364(r30)
> 00618c14 <bq_remove+0x1a0> crclr 4*cr1+eq
> 00618c18 <bq_remove+0x1a4> mr r4,r29
> 00618c1c <bq_remove+0x1a8> bl 00541ca0 <panic>
> . . .
>
> Comparing against 4 does not match any part of
> bq_remove. Comparison via gt would make sense for:
Wrong.
The 4 and gt use comes from inlining bufqueue(bp)
in the following KASSERT. For reference (from the .i):
bufqueue(struct buf *bp)
{
switch (bp->b_qindex) {
case 0:
case 4:
return (((void *)0));
case 1:
return (&bqempty);
case 2:
return (&bufdomain(bp)->bd_dirtyq);
case 3:
return (&bufdomain(bp)->bd_subq[bp->b_subqueue]);
default:
break;
}
panic("bufqueue(%p): Unhandled type %d\n", bp, bp->b_qindex);
}
The code generation put the first KASSERT's related panic in
to case 0 above when bufqueue was inlined.
> /usr/src/sys/sys/buf.h: uint8_t b_qindex; /* (Q) buffer queue index */)
>
> if the comparison was against zero. It should
> have been:
>
> /usr/src/sys/kern/vfs_bio.c:#define QUEUE_NONE 0 /* on no queue */
>
>
> This is for a head -r347549 32-bit powerpc FreeBSD context,
> built with system clang (an amd6->powerpc cross build using
> devel/powerpc64-binutils ).
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-toolchain
mailing list