[Bug 235419] zpool scrub progress does not change for hours, heavy disk activity still present
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Sat Feb 2 06:37:20 UTC 2019
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235419
Bug ID: 235419
Summary: zpool scrub progress does not change for hours, heavy
disk activity still present
Product: Base System
Version: 11.2-STABLE
Hardware: Any
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: bugs at FreeBSD.org
Reporter: bobf at mrp3.com
Frequently, on one of my computers running 11-STABLE, a 'zpool scrub' will
continue for hours while progress does not increase. The scrub is still
'active' and there is a LOT of disk activity, causing stuttering of application
response as you would expect. This does not always happen, but happens more
often than not. The previous scrub completed without any such 'hangs' 2 weeks
ago, with no changes to the configuration since.
This system uses a 'zfs everywhere' configuration, i.e. all partitions are zfs.
A second computer that has UFS+J partitions for userland and kernel does not
appear to exhibit this particular problem.
uname output:
FreeBSD hack.SFT.local 11.2-STABLE FreeBSD 11.2-STABLE #1 r339273: Tue Oct 9
21:10:39 PDT 2018 root at hack.SFT.local:/usr/obj/usr/src/sys/GENERIC amd64
This system had been running for 80+ days.
At first, I discovered that the scrub had 'hung' at around 74% complete. After
pausing the scrub for a while, and also terminating firefox and thunderbird,
the scrub re-started and continued. I re-started firefox and thunderbird, and
allowed everything to continue. The scrub then 'hung' again at about 84%, and
terminating applications (including Xorg) did not seem to help.
With the scrub paused I performed a reboot, and the scrub restarted on boot
[causing the boot process to be excrutiatingly slow]. I have restarted most of
the applications that were running before, while the scrub was continuing to
run . Now the zpool status shows that the scrub has completed with no errors.
here are some additional pieces of information that might help:
> mount
zroot/ROOT/default on / (zfs, NFS exported, local, noatime, nfsv4acls)
devfs on /dev (devfs, local, multilabel)
zroot/d-drive on /d-drive (zfs, NFS exported, local, noatime, nfsv4acls)
zroot/e-drive on /e-drive (zfs, NFS exported, local, noatime, nfsv4acls)
zroot/tmp on /tmp (zfs, local, noatime, nosuid, nfsv4acls)
zroot/usr/home on /usr/home (zfs, NFS exported, local, noatime, nfsv4acls)
zroot/usr/ports on /usr/ports (zfs, NFS exported, local, noatime, nosuid,
nfsv4acls)
zroot/usr/src on /usr/src (zfs, NFS exported, local, noatime, nfsv4acls)
zroot/var/audit on /var/audit (zfs, local, noatime, noexec, nosuid, nfsv4acls)
zroot/var/crash on /var/crash (zfs, local, noatime, noexec, nosuid, nfsv4acls)
zroot/var/log on /var/log (zfs, local, noatime, noexec, nosuid, nfsv4acls)
zroot/var/mail on /var/mail (zfs, local, nfsv4acls)
zroot/var/tmp on /var/tmp (zfs, local, noatime, nosuid, nfsv4acls)
zroot on /zroot (zfs, local, noatime, nfsv4acls)
> kldstat
Id Refs Address Size Name
1 44 0xffffffff80200000 206b5d0 kernel
2 1 0xffffffff8226d000 393200 zfs.ko
3 2 0xffffffff82601000 a380 opensolaris.ko
4 1 0xffffffff82821000 4090 cuse.ko
5 1 0xffffffff82826000 6e40 uftdi.ko
6 1 0xffffffff8282d000 3c58 ucom.ko
7 3 0xffffffff82831000 50c70 vboxdrv.ko
8 2 0xffffffff82882000 2ad0 vboxnetflt.ko
9 2 0xffffffff82885000 9a20 netgraph.ko
10 1 0xffffffff8288f000 14b8 ng_ether.ko
11 1 0xffffffff82891000 3f70 vboxnetadp.ko
12 2 0xffffffff82895000 37528 linux.ko
13 2 0xffffffff828cd000 2d28 linux_common.ko
14 1 0xffffffff828d0000 31e80 linux64.ko
15 1 0xffffffff82902000 c60 coretemp.ko
16 1 0xffffffff82903000 965128 nvidia.ko
there were no messages regarding zpool scrub that I could find.
port versions for things with kernel modules:
nvidia-driver-340-340.106
virtualbox-ose-5.1.18
virtualbox-ose-kmod-5.1.22
linux-c7-7.3.1611_1
This problem has happened since mid last year, around the time when the -STABLE
source went to 11.2 and I updated kernel+world on this computer. The zpool has
also been upgraded. It is worth noting that this computer ran 11.0 for a long
time without incident. The problem may have been present in 11.1 .
Related: there is an apparent (random crash) bug in the NVidia module that I
have been trying to track down. It causes occasional page fault crashes.
Sometimes I will see swap space in use when there does not seem to be any
reason for it, and I believe this NVidia bug is a part of that (the crash
happening from randomly accessing 'after free' or random memory addresses, and
swap space is allocated as a consequence?). Whether this NVidia driver bug is
responsible for the zfs problem, I do not know, but this driver is only on this
particular computer, and so it's worth mentioning, as only this computer seems
to exhibit the problem.
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list