Re: Corrupted bp->b_lblkno on bread() // Life-cycle of a buf obj?
Date: Thu, 19 Jun 2025 19:20:45 UTC
On Thu, Jun 19, 2025 at 03:03:53PM -0400, Sanchit Sahay wrote:
> > There is something strange in the sentence. First you claim that
> > b_blkno == b_lblkno, then you claim thant b_lbkno changes from 0 to some
> > random value.
>
> Apologies for the confusing phrasing. What I meant by this is that
> pre-calling VOP_STRATEGY blkno and lblkno are the same (both are 0 in this
> particular case), which implies there needs to be a bmap call.
>
> > And this smells like an KBI (Kernel Binary Interface) issue, since
> DEBUG_LOCKS
> > changes the layout of the struct lock, which is embedded into struct buf
> > with which you have problems.
>
> > How do you build your fs code? As a module? If yes, you must use the same
> > set of opt_*.h headers as used for the kernel build.
>
> I think this might be it, I am building it as a kmod and hadn't taken the
> changed struct into account. Will try including these headers. Was starting
> to see similar behaviour creep up in a different code path as well. Thanks
> for the help!
How do you intend to include them?
The right way, if you build your module out of tree, is to do
something like the following:
make -C <module src dir> SYSDIR=<kernel sources path> KERNBUILDDIR=<config output path>
i.e. KERNBUILDDIR should point to the directory where config(8) put
the generated files, most important are opt_*.h.
>
> On Thu, 19 Jun 2025 at 14:42, Konstantin Belousov <kostikbel@gmail.com>
> wrote:
>
> > On Tue, Jun 17, 2025 at 11:07:49PM -0400, Sanchit Sahay wrote:
> > > I'm working on porting a filesystem to FreeBSD, and am running into an
> > > issue that I'm having difficulty debugging. Any help would be
> > appreciated.
> > >
> > > When calling bread() with an blkno=lblkno, by the time the flow of the
> > > control reaches the vop_strategy function, the value of lblkno changes
> > from
> > > 0 to a seemingly random value.
> > There is something strange in the sentence. First you claim that
> > b_blkno == b_lblkno, then you claim thant b_lbkno changes from 0 to some
> > random value.
> >
> > So, is it 0 or b_blkno?
> >
> > >
> > > Having inspected this with gdb,
> > >
> > > On frame 9:
> > >
> > > #9 0xffff0000c3e72930 in hfs_strategy ()
> > > 1488 kdb_enter("lblk random", "lblk random");
> > >
> > > *(kgdb) p ap->a_bp->b_lblkno$10 = -281474971149872*
> > >
> > > On frame 10:
> > >
> > > #10 0xffff0000009387b0 in VOP_STRATEGY_APV () at vnode_if.c:2423
> > > 2423 rc = vop->vop_strategy(a);
> > >
> > > *(kgdb) p a->a_bp->b_lblkno$11 = 0*
> > And the same pattern occurs there.
> >
> > >
> > > This flow is triggered when calling bread() like so:
> > >
> > > retval = bread(vp, blockNum, block->blockSize, NOCRED, &bp);
> > >
> > > The stack trace is:
> > >
> > > #9 0xffff0000c3e72930 in hfs_strategy (ap=0xffff00009bbd1058)
> > > #10 0xffff0000009387b0 in VOP_STRATEGY_APV (
> > > #11 0xffff00000054bbcc in VOP_STRATEGY (vp=0xffff000000a08fc5,
> > > #12 bufstrategy (bo=<optimized out>, bp=0xffff0000404990c8)
> > > #13 0xffff00000054d6f0 in bstrategy (bp=0xffff0000404990c8)
> > > #14 breadn_flags
> > >
> > > There seems to be no code run between these two stacks, the a_bp in both
> > > these frames points to the same memory address. No other fields are
> > > modified between these two frames.
> > >
> > > Because of this seemingly random lblkno value, VOP_BMAP is not triggered,
> > > and the read returns arbitrary results.
> > >
> > > This issue only occurs when I have the kernel compiled with these
> > > additional flags (as suggested by the handbook for debugging deadlocks):
> > >
> > > options INVARIANTS
> > > options INVARIANT_SUPPORT
> > > options WITNESS
> > > options WITNESS_SKIPSPIN
> > > options DEBUG_LOCKS
> > > options DEBUG_VFS_LOCKS
> > > options DIAGNOSTIC
> > >
> > > Without these flags enabled, this lblkno corruption does not take place,
> > > and the bread returns a valid read. I don't see any conditional code that
> > > these flags enable which would cause such an issue.
> > And this smells like an KBI (Kernel Binary Interface) issue, since
> > DEBUG_LOCKS
> > changes the layout of the struct lock, which is embedded into struct buf
> > with which you have problems.
> >
> > How do you build your fs code? As a module? If yes, you must use the same
> > set of opt_*.h headers as used for the kernel build.
> >
> > >
> > > Any tips on how to investigate this further would be greatly appreciated,
> > > or if I am missing something about the lifecycle of the buffer object
> > that
> > > might cause it to "reset" certain fields.
> > >
> > > Thanks
> > > Sanchit Sahay
> >