Re: Corrupted bp->b_lblkno on bread() // Life-cycle of a buf obj?

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Thu, 19 Jun 2025 18:42:01 UTC
On Tue, Jun 17, 2025 at 11:07:49PM -0400, Sanchit Sahay wrote:
> I'm working on porting a filesystem to FreeBSD, and am running into an
> issue that I'm having difficulty debugging. Any help would be appreciated.
> 
> When calling bread() with an blkno=lblkno, by the time the flow of the
> control reaches the vop_strategy function, the value of lblkno changes from
> 0 to a seemingly random value.
There is something strange in the sentence.  First you claim that
b_blkno == b_lblkno, then you claim thant b_lbkno changes from 0 to some
random value.

So, is it 0 or b_blkno?

> 
> Having inspected this with gdb,
> 
> On frame 9:
> 
> #9  0xffff0000c3e72930 in hfs_strategy ()
> 1488            kdb_enter("lblk random", "lblk random");
> 
> *(kgdb) p ap->a_bp->b_lblkno$10 = -281474971149872*
> 
> On frame 10:
> 
> #10 0xffff0000009387b0 in VOP_STRATEGY_APV () at vnode_if.c:2423
> 2423                    rc = vop->vop_strategy(a);
> 
> *(kgdb) p a->a_bp->b_lblkno$11 = 0*
And the same pattern occurs there.

> 
> This flow is triggered when calling bread() like so:
> 
> retval = bread(vp, blockNum, block->blockSize, NOCRED, &bp);
> 
> The stack trace is:
> 
> #9  0xffff0000c3e72930 in hfs_strategy (ap=0xffff00009bbd1058)
> #10 0xffff0000009387b0 in VOP_STRATEGY_APV (
> #11 0xffff00000054bbcc in VOP_STRATEGY (vp=0xffff000000a08fc5,
> #12 bufstrategy (bo=<optimized out>, bp=0xffff0000404990c8)
> #13 0xffff00000054d6f0 in bstrategy (bp=0xffff0000404990c8)
> #14 breadn_flags
> 
> There seems to be no code run between these two stacks, the a_bp in both
> these frames points to the same memory address. No other fields are
> modified between these two frames.
> 
> Because of this seemingly random lblkno value, VOP_BMAP is not triggered,
> and the read returns arbitrary results.
> 
> This issue only occurs when I have the kernel compiled with these
> additional flags (as suggested by the handbook for debugging deadlocks):
> 
> options INVARIANTS
> options INVARIANT_SUPPORT
> options WITNESS
> options WITNESS_SKIPSPIN
> options DEBUG_LOCKS
> options DEBUG_VFS_LOCKS
> options DIAGNOSTIC
> 
> Without these flags enabled, this lblkno corruption does not take place,
> and the bread returns a valid read. I don't see any conditional code that
> these flags enable which would cause such an issue.
And this smells like an KBI (Kernel Binary Interface) issue, since DEBUG_LOCKS
changes the layout of the struct lock, which is embedded into struct buf
with which you have problems.

How do you build your fs code? As a module?  If yes, you must use the same
set of opt_*.h headers as used for the kernel build.

> 
> Any tips on how to investigate this further would be greatly appreciated,
> or if I am missing something about the lifecycle of the buffer object that
> might cause it to "reset" certain fields.
> 
> Thanks
> Sanchit Sahay