[Bug 218337] panic: Journal overflow with g_journal_switcher waiting on wswbuf0
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Thu Apr 6 16:33:51 UTC 2017
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218337
longwitz at incore.de changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |longwitz at incore.de
--- Comment #2 from longwitz at incore.de ---
Thanks for the extensive example, the information of your kerneldump looks
nearly identical to mine.
I think the problem with u_int/u_long variables in g_journal.c is not the
reason for the panic, there is already the PR kern/198500.
The panic has to do with the extra physical memory described in man pbuf(9).
I see the following user of pbufs (I don't have nfs and fuse):
nbuf = 105931
nswbuf = min(nbuf / 4, 256) = 256;
User of pbufs: boottime kerneldump
md_vnode_pbuf_freecnt 25 25
smbfs_pbuf_freecnt 129 129
ncl_pbuf_freecnt 129 129
cluster_pbuf_freecnt 128 0
vnode_pbuf_freecnt 129 129
nsw_rcount 128 128
nsw_wcount_sync 64 64
The g_journal_switcher (pid 7) is waiting on channel "wswbuf0"
because all the cluster_pbufs are in use. Looking at the 256 swbufs I found,
that 128 are free and 128 are in use by the g_journal_switcher itself. All
the buffers have the same b_bufobj and the same b_iodone address
"cluster_callback". One example:
(kgdb) p swbuf[126]
$352 = {b_bufobj = 0xfffff80b7e1acbe0, b_bcount = 131072, b_caller1 = 0x0,
b_data = 0xfffffe0c1c2ec000 "", b_error = 0, b_iocmd = 2 '\002', b_ioflags = 0
'\0',
b_iooffset = 472973312, b_resid = 0, b_iodone = 0xffffffff8073fcb0
<cluster_callback>, b_blkno = 923776, b_offset = 2841903104, b_bobufs =
{tqe_next = 0x0,
tqe_prev = 0x0}, b_vflags = 0, b_freelist = {tqe_next = 0xfffffe0bafe182b8,
tqe_prev = 0xffffffff80eaa460}, b_qindex = 0, b_flags = 1677721636,
b_xflags = 0 '\0', b_lock = {lock_object = {lo_name = 0xffffffff80a829a7
"bufwait", lo_flags = 108199936, lo_data = 0, lo_witness = 0x0},
lk_lock = 18446744073709551600, lk_exslpfail = 0, lk_timo = 0, lk_pri =
96}, b_bufsize = 131072, b_runningbufspace = 131072,
b_kvabase = 0xfffffe0c1c2ec000 "", b_kvaalloc = 0x0, b_kvasize = 131072,
b_lblkno = 86728, b_vp = 0xfffff80b7e1acb10, b_dirtyoff = 0, b_dirtyend =
131072,
b_rcred = 0x0, b_wcred = 0x0, b_saveaddr = 0xfffffe0c1c2ec000, b_pager =
{pg_reqpage = 0}, b_cluster = {cluster_head = {tqh_first = 0xfffffe0bb1bec410,
tqh_last = 0xfffffe0bb1bebe20}, cluster_entry = {tqe_next =
0xfffffe0bb1bec410, tqe_prev = 0xfffffe0bb1bebe20}}, b_pages =
{0xfffff80c097f97c0,
0xfffff80c097f9828, 0xfffff80c097f9890, 0xfffff80c097f98f8,
0xfffff80c097f9960, 0xfffff80c097f99c8, 0xfffff80c097f9a30, 0xfffff80c097f9a98,
0xfffff80c097f9b00, 0xfffff80c097f9b68, 0xfffff80c097f9bd0,
0xfffff80c097f9c38, 0xfffff80c097f9ca0, 0xfffff80c097f9d08, 0xfffff80c097f9d70,
0xfffff80c097f9dd8, 0xfffff80c097f9e40, 0xfffff80c097f9ea8,
0xfffff80c097f9f10, 0xfffff80c097f9f78, 0xfffff80c097f9fe0, 0xfffff80c097fa048,
0xfffff80c097fa0b0, 0xfffff80c097fa118, 0xfffff80c11547980,
0xfffff80c115479e8, 0xfffff80c11547a50, 0xfffff80c11547ab8, 0xfffff80c11547b20,
0xfffff80c11547b88, 0xfffff80c11547bf0, 0xfffff80c11547c58}, b_npages = 32,
b_dep = {lh_first = 0x0}, b_fsprivate1 = 0x0, b_fsprivate2 = 0x0,
b_fsprivate3 = 0x0, b_pin_count = 0}
Therefore the g_journal_switcher has all his cluster pbufs in use and waits
forever for another one. So the worker thread must panic with overflow.
In cluster_wbuild() I can't see a check for "cluster_pbuf_freecnt > 0" to avoid
the hang on "wswbuf0". I wonder why this seems only a problem with gjournal,
other components in the kernel also use VFS_SYNC.
I would like to know if this problem can be fixed.
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-geom
mailing list