[Bug 253992] ufsdirhash_build() sleeps with directory vnode locked due to M_WAITOK
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Wed Mar 3 18:18:55 UTC 2021
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253992
Bug ID: 253992
Summary: ufsdirhash_build() sleeps with directory vnode locked
due to M_WAITOK
Product: Base System
Version: 12.1-RELEASE
Hardware: Any
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: kern
Assignee: bugs at FreeBSD.org
Reporter: dgmorris at earthlink.net
On a system where a timeout panic has been added for attempts to acquire a UFS
root vnode lock, such a panic was observed. The holder of the lock was
attempting to do:
ched_switch() at sched_switch+0xbcc/frame 0xfffffe885e4e5e10
mi_switch() at mi_switch+0x29b/frame 0xfffffe885e4e5e60
sleepq_wait() at sleepq_wait+0x2c/frame 0xfffffe885e4e5e90
_sleep() at _sleep+0x265/frame 0xfffffe885e4e5f10
vm_wait_doms() at vm_wait_doms+0xfd/frame 0xfffffe885e4e5f40
vm_wait_domain() at vm_wait_domain+0x50/frame 0xfffffe885e4e5f70
vm_domain_alloc_fail() at vm_domain_alloc_fail+0x91/frame 0xfffffe885e4e5fb0
vm_page_alloc_domain_after() at vm_page_alloc_domain_after+0x243/frame
0xfffffe885e4e6030
uma_small_alloc() at uma_small_alloc+0x9b/frame 0xfffffe885e4e6080
keg_alloc_slab() at keg_alloc_slab+0x103/frame 0xfffffe885e4e60d0
zone_import() at zone_import+0x106/frame 0xfffffe885e4e6170
zone_alloc_item() at zone_alloc_item+0x6e/frame 0xfffffe885e4e61b0
cache_alloc_retry() at cache_alloc_retry+0x13c/frame 0xfffffe885e4e6210
uma_zalloc_arg() at uma_zalloc_arg+0x114/frame 0xfffffe885e4e6270
ufsdirhash_build() at ufsdirhash_build+0x850/frame 0xfffffe885e4e6310
ufs_lookup_ino() at ufs_lookup_ino+0x1a2/frame 0xfffffe885e4e6420
VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0x8a/frame 0xfffffe885e4e6450
vfs_cache_lookup() at vfs_cache_lookup+0xd6/frame 0xfffffe885e4e64b0
VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x8a/frame 0xfffffe885e4e64e0
lookup() at lookup+0x756/frame 0xfffffe885e4e6590
namei() at namei+0x46c/frame 0xfffffe885e4e6660
_vn_open_cred() at _vn_open_cred+0x1e1/frame 0xfffffe885e4e67f0
_vn_open() at _vn_open+0x1f/frame 0xfffffe885e4e6820
isi_kern_openat() at isi_kern_openat+0x3b4/frame 0xfffffe885e4e6a10
sys_openat() at sys_openat+0x5a/frame 0xfffffe885e4e6a70
amd64_syscall() at amd64_syscall+0x381/frame 0xfffffe885e4e6bf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe885e4e6bf0
— syscall (499, FreeBSD ELF64, sys_openat), rip = 0x8005b0d3a, rsp =
0x7fffffffe368, rbp = 0x7fffffffe3e0 —
The system is ccNUMA, with 2 domains. One domain is OOM, the other is not:
db> show pageq
pq_free 1887153
dom 0 page_cnt 8102741 free 2 pq_act 57 pq_inact 1 pq_laund 0 pq_unsw 311370
dom 1 page_cnt 8133281 free 1887151 pq_act 17391 pq_inact 4868531
pq_laund 0 pq_unsw 186545
The DIRHASH uma zone is created with flags argument of 0, so acquires the
default NUMA policy (FIRSTTOUCH). The problem allocation is on line 437:
dh->dh_hash = malloc(narrays * sizeof(dh->dh_hash[0]),
M_DIRHASH, M_NOWAIT | M_ZERO);
if (dh->dh_hash == NULL)
goto fail;
dh->dh_blkfree = malloc(nblocks * sizeof(dh->dh_blkfree[0]),
M_DIRHASH, M_NOWAIT);
if (dh->dh_blkfree == NULL)
goto fail;
for (i = 0; i < narrays; i++) {
if ((dh->dh_hash[i] = DIRHASH_BLKALLOC_WAITOK()) == NULL)
goto fail;
for (j = 0; j < DH_NBLKOFF; j++)
dh->dh_hash[i][j] = DIRHASH_EMPTY;
}
Where:
#define DIRHASH_BLKALLOC_WAITOK() uma_zalloc(ufsdirhash_zone, M_WAITOK)
from line 73.
As can be seen, the code is even assuming the allocation can fail as if it were
NOWAIT, but is instead doing M_WAITOK while holding the directory vnode lock
per sys/ufs/ufs/ufs_lookup.c line 254:
#ifdef DEBUG_VFS_LOCKS
/*
* Assert that the directory vnode is locked, and locked
* exclusively for the last component lookup for modifying
* operations.
*
* The directory-modifying operations need to save
* intermediate state in the inode between namei() call and
* actual directory manipulations. See fields in the struct
* inode marked as 'used during directory lookup'. We must
* ensure that upgrade in namei() does not happen, since
* upgrade might need to unlock vdp. If quotas are enabled,
* getinoquota() also requires exclusive lock to modify inode.
*/
ASSERT_VOP_LOCKED(vdp, "ufs_lookup1");
if ((nameiop == CREATE || nameiop == DELETE || nameiop == RENAME) &&
(flags & (LOCKPARENT | ISLASTCN)) == (LOCKPARENT | ISLASTCN))
ASSERT_VOP_ELOCKED(vdp, "ufs_lookup2");
#endif
restart:
bp = NULL;
slotoffset = -1;
/*
* We now have a segment name to search for, and a directory to search.
*
* Suppress search for slots unless creating
* file and at end of pathname, in which case
* we watch for a place to put the new file in
* case it doesn't already exist.
*/
ino = 0;
i_diroff = dp->i_diroff;
slotstatus = FOUND;
slotfreespace = slotsize = slotneeded = 0;
if ((nameiop == CREATE || nameiop == RENAME) &&
(flags & ISLASTCN)) {
slotstatus = NONE;
slotneeded = DIRECTSIZ(cnp->cn_namelen);
}
#ifdef UFS_DIRHASH
/*
* Use dirhash for fast operations on large directories. The logic
* to determine whether to hash the directory is contained within
* ufsdirhash_build(); a zero return means that it decided to hash
* this directory and it successfully built up the hash table.
*/
if (ufsdirhash_build(dp) == 0) {
The NUMA system may be more likely to hit this in the case of imbalanced load
(as the system as a whole may not be OOM, just the domain), but the risk is
there on any system.
I propose the following diff as the WAITOK allocation is only used in one
place:
dmorris at wkstn-dmorris2:freebsd_ufs $ git diff -p origin/main
diff --git a/sys/ufs/ufs/ufs_dirhash.c b/sys/ufs/ufs/ufs_dirhash.c
index 13094b5a1c97..72f38be5eea4 100644
--- a/sys/ufs/ufs/ufs_dirhash.c
+++ b/sys/ufs/ufs/ufs_dirhash.c
@@ -108,7 +108,7 @@ static uma_zone_t ufsdirhash_zone;
#define DIRHASHLIST_LOCK() mtx_lock(&ufsdirhash_mtx)
#define DIRHASHLIST_UNLOCK() mtx_unlock(&ufsdirhash_mtx)
-#define DIRHASH_BLKALLOC_WAITOK() uma_zalloc(ufsdirhash_zone, M_WAITOK)
+#define DIRHASH_BLKALLOC_NOWAIT() uma_zalloc(ufsdirhash_zone, M_NOWAIT)
#define DIRHASH_BLKFREE(ptr) uma_zfree(ufsdirhash_zone, (ptr))
#define DIRHASH_ASSERT_LOCKED(dh)
\
sx_assert(&(dh)->dh_lock, SA_LOCKED)
@@ -425,7 +425,7 @@ ufsdirhash_build(struct inode *ip)
if (dh->dh_blkfree == NULL)
goto fail;
for (i = 0; i < narrays; i++) {
- if ((dh->dh_hash[i] = DIRHASH_BLKALLOC_WAITOK()) == NULL)
+ if ((dh->dh_hash[i] = DIRHASH_BLKALLOC_NOWAIT()) == NULL)
goto fail;
for (j = 0; j < DH_NBLKOFF; j++)
dh->dh_hash[i][j] = DIRHASH_EMPTY;
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list