Corrupted bp->b_lblkno on bread() // Life-cycle of a buf obj?
Date: Wed, 18 Jun 2025 03:07:49 UTC
I'm working on porting a filesystem to FreeBSD, and am running into an
issue that I'm having difficulty debugging. Any help would be appreciated.
When calling bread() with an blkno=lblkno, by the time the flow of the
control reaches the vop_strategy function, the value of lblkno changes from
0 to a seemingly random value.
Having inspected this with gdb,
On frame 9:
#9 0xffff0000c3e72930 in hfs_strategy ()
1488 kdb_enter("lblk random", "lblk random");
*(kgdb) p ap->a_bp->b_lblkno$10 = -281474971149872*
On frame 10:
#10 0xffff0000009387b0 in VOP_STRATEGY_APV () at vnode_if.c:2423
2423 rc = vop->vop_strategy(a);
*(kgdb) p a->a_bp->b_lblkno$11 = 0*
This flow is triggered when calling bread() like so:
retval = bread(vp, blockNum, block->blockSize, NOCRED, &bp);
The stack trace is:
#9 0xffff0000c3e72930 in hfs_strategy (ap=0xffff00009bbd1058)
#10 0xffff0000009387b0 in VOP_STRATEGY_APV (
#11 0xffff00000054bbcc in VOP_STRATEGY (vp=0xffff000000a08fc5,
#12 bufstrategy (bo=<optimized out>, bp=0xffff0000404990c8)
#13 0xffff00000054d6f0 in bstrategy (bp=0xffff0000404990c8)
#14 breadn_flags
There seems to be no code run between these two stacks, the a_bp in both
these frames points to the same memory address. No other fields are
modified between these two frames.
Because of this seemingly random lblkno value, VOP_BMAP is not triggered,
and the read returns arbitrary results.
This issue only occurs when I have the kernel compiled with these
additional flags (as suggested by the handbook for debugging deadlocks):
options INVARIANTS
options INVARIANT_SUPPORT
options WITNESS
options WITNESS_SKIPSPIN
options DEBUG_LOCKS
options DEBUG_VFS_LOCKS
options DIAGNOSTIC
Without these flags enabled, this lblkno corruption does not take place,
and the bread returns a valid read. I don't see any conditional code that
these flags enable which would cause such an issue.
Any tips on how to investigate this further would be greatly appreciated,
or if I am missing something about the lifecycle of the buffer object that
might cause it to "reset" certain fields.
Thanks
Sanchit Sahay