dump trying to access incorrect block numbers?
Mark Millard
markmi at dsl-only.net
Sat Jul 8 16:45:11 UTC 2017
[I add notes about a problem that happens after the
"fsck -B". Also forgot to mention: production style
kernel world builds were in use. And a tried a
powerpc64 build and it works the same.]
On 2017-Jul-7, at 11:09 PM, Mark Millard <markmi at dsl-only.net> wrote:
> [This note has more information than one sent with extra text
> in the subject but with a partially different "to" list.]
>
> Peter Jeremy peter at rulingia.com wrote on
> Sat Jul 8 02:00:47 UTC 2017 :
>
>> When did you first notice this (what SVN revision)?
>> Do you know what the last good SVN revision was?
>> Is this a new or old filesystem?
>> Is the filesystem mounted/active or not when you dump it?
>> What are the relevant parameters for the filesystem on ada0s3a?
>> Are you running softupdates, journalling etc?
>> Which dump(8) phase is reporting the errors?
>> What are the exact dump and fsck commands you ran?
>
> I can add a little information with some contrast
> and only "fsck -B" in use (with an unclean file
> system from a prior crash), no dump use. Still:
> a snapshot is involved in the below.
>
> Unfortunately two problems with major consequences
> for my involved context limit the svn range that I
> can cover for the activity, the problem version
> ranges being:
>
> -r319722 through -r320651 (fixed by -r320652)
> (actually this is why I had used "boot -s"
> in what I report later: I could get to a
> shell prompt that way instead of crashing
> before any login prompt; the crashes left
> the file system in need of repair)
>
> -r320509 through -r320561 (fixed by -r320570)
>
> So I was using -r320570 to avoid one of the
> two problems.
>
>
>
> Context: 32-bit powerpc FreeBSD used on PowerMac G5
> so-called "Quad-core". (So big-endian as well.)
> Softupdates, no journalling. Long-in-use file
> system having lots of FreeBSD versions updates
> and port rebuilds over the time.
>
> The following is from now, not from the time of the
> example messages:
>
> # dumpfs / | more
> magic 19540119 (UFS2) time Fri Jul 7 22:53:34 2017
> superblock location 65536 id [ <OMITTED> ]
> ncg 158 size 25165823 blocks 24372006
> bsize 32768 shift 15 mask 0xffff8000
> fsize 4096 shift 12 mask 0xfffff000
> frag 8 shift 3 fsbtodb 3
> minfree 8% optim time symlinklen 120
> maxbsize 32768 maxbpg 4096 maxcontig 4 contigsumsize 4
> nbfree 2130375 ndir 65518 nifree 11769796 nffree 425065
> bpg 20032 fpg 160256 ipg 80128 unrefs 0
> nindir 4096 inopb 128 maxfilesize 2252349704110079
> sbsize 4096 cgsize 32768 csaddr 5048 cssize 4096
> sblkno 24 cblkno 32 iblkno 40 dblkno 5048
> cgrotor 127 fmod 0 ronly 0 clean 0
> metaspace 6408 avgfpdir 64 avgfilesize 16384
> flags soft-updates trim
> fsmnt /
> volname FBSDG4Srootfs swuid 0 providersize 25165823
> . . .
>
>
>
> What I had done that produced the messages was:
>
> <Prior failed multi-user boot from system problem
> leaves root (only) file system not marked clean
> so fsck -B will actually do something below>
>
> boot -s (so: single user mode)
> # The next 3 lines are the content of a generic, manually-run script.
> mount -u /
> mount -a -t ufs (but there is no other file system)
> swapon -a (there is a swap partition)
> #
> fsck -B
>
> That "fsck -B" caused the same kinds of lines
> reported by Michael Butler, happening as fsck
> makes a snapshot for the background processing
> to use. (I have camera pictures and could type
> in some of the lines if needed.)
>
> After those lines was text like (typed in from
> an example camera picture):
>
> ** //.snap/fsck_snapshot
> ** Last Mount on /
> ** Root file system
> ** Phase 1 - Check Blocks and Sizes
> ** Phase 2 - Check Pathnames
> ** Phase 3 - Check Connectivity
> ** Phase 4 - Check Reference Counts
> ** Phase 5 - Check Cyl groups
> Reclaimed: 0 directories, 1 files, 22680 fragments
> 780914 files, 4797127 used, 19552199 free (443479 frags, 3288590 blocks, 1.8% fragmentation)
>
> ***** FILE SYSTEM MARKED CLEAN *****
[I forgot or mention that the context was a
production style kernel and world build,
no invariants or other such.]
Since I'm running a patched -r320570 for the
issue:
-r319722 through -r320651 (fixed by -r320652)
I went back and forced a power-off without
shutdown and did the sequence:
boot -s (so: single user mode)
# The next 3 lines are the content of a generic, manually-run script.
mount -u /
mount -a -t ufs (but there is no other file system)
swapon -a (there is a swap partition)
#
fsck -B
but always waited briefly after the fsck -B finished.
Like before the following happens as it tries to trim:
(typed in from camera picture)
panic: ffs_blkfree_cq: freeing free block
cpuid = 2 (varies, of course)
time = (varies)
KDB: stack backtrace
(stack addresses can vary: just an example here)
0xd23b17e0: at kdb_backtrace+0x5c
0xd23b1850: at vpanic+0x1e8
0xd23b18c0: at panic+0x54
0xd23b1910: at ffs_blkfree_cq+0x278
0xd23b1980: at ffs_blkfree_trim_task+0x60
0xd23b19b0: at taskqueue_run_locked+0x10
0xd23b1a10: at taskqueue_thread_loop+0x174
0xd23b1a50: at fork_exit+0xf4
0xd23b1a80: at fork_trampoline+0xc
KDB: enter: panic
[ thread pid 0 tid 1000082 ]
Stopped at kdb_enter_0x70: addi r0,r0,0x0
I've tried this on a powerpc64 and it works
the same, complete with the "freeing free
block" issue.
===
Mark Millard
markmi at dsl-only.net
More information about the freebsd-current
mailing list