[Bug 224292] processes are hanging in state ufs / possible deadlock in file system
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Tue Dec 12 19:02:31 UTC 2017
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224292
Bug ID: 224292
Summary: processes are hanging in state ufs / possible deadlock
in file system
Product: Base System
Version: CURRENT
Hardware: Any
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: kern
Assignee: freebsd-bugs at FreeBSD.org
Reporter: wosch at FreeBSD.org
A friend and I are testing perl modules on FreeBSD, on all the major versions
(12-current, 11-stable, 11-RELEASE). The tests are done in a cheap VM (1GB RAM,
40GB SSD, 1 core), apparently running on virtio/kvm.
While testing we noticed that the VM randomly crashes every 1-3 days, and we
had to run a hard fsck on the console. I first thought it is a resource
problem, but munin doesn’t show any shortage of RAM, disk or file descriptors.
We find out that the crashes occurred shortly after processes hanging in state
“ufs”, as shown in top(1) output:
$ top -b 300 | egrep ufs
73278 root 1 31 0 8364K 2340K ufs 0:02 10.35% find
34548 cpansand 1 52 10 43828K 15348K ufs 0:00 0.00% perl
34824 cpansand 1 20 0 21060K 1960K ufs 0:00 0.00% procstat
34557 cpansand 1 52 10 13180K 2112K ufs 0:00 0.00% sh
$ ps xuawww | egrep 73278
root 73278 0.0 0.2 8364 2340 - D 03:01 0:02.25 find -sx
/ /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g+x
-or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} +
root 73667 0.0 0.0 412 312 8 R+ 03:08 0:00.00 egrep
73278
$ ps xuawww | egrep 34548
cpansand 34548 0.0 1.5 43828 15348 4 DN+ 15:29 0:00.10
/usr/perl5.26.1p/bin/perl t/test_command.t
Another example:
$ lsof | egrep 12710
perl5.26. 12710 cpansand cwd VDIR
18446735278340920664 0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1
perl5.26. 12710 cpansand rtd VDIR
7507057954083258955 0 / (/dev/vtbd0p3)
perl5.26. 12710 cpansand 0u VCHR 0,90
0t20972 90 /dev/pts/0
perl5.26. 12710 cpansand 1u PIPE 0xfffff8000399f168
0 ->0xfffff8000399f000
perl5.26. 12710 cpansand 2u PIPE 0xfffff8001c30da50
0 ->0xfffff8001c30d8e8
perl5.26. 12710 cpansand 3u PIPE 0xfffff8001c30ebe0
16384 ->0xfffff8001c30ed48
perl5.26. 12710 cpansand 5u PIPE 0xfffff8000399f168
0 ->0xfffff8000399f000
perl5.26. 12710 cpansand 6u PIPE 0xfffff8001c30da50
0 ->0xfffff8001c30d8e8
perl5.26. 12710 cpansand 7u PIPE 0xfffff8000399f168
0 ->0xfffff8000399f000
perl5.26. 12710 cpansand 8u PIPE 0xfffff8001c30da50
0 ->0xfffff8001c30d8e8
$ top -b 300 | egrep ufs
12358 cpansand 1 20 10 268M 12980K ufs 0:04 0.00% perl
12710 cpansand 1 20 10 21412K 13808K ufs 0:00 0.00% perl5.26.0
56111 root 1 20 0 7752K 3180K ufs 0:00 0.00% bash
56077 root 1 52 0 6332K 2312K ufs 0:00 0.00% find
12719 cpansand 1 20 10 7060K 1472K ufs 0:00 0.00% sh
$ lsof | egrep '12710|12358' | egrep home
perl 12358 cpansand cwd VDIR
18446735278340920664 0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1
perl5.26. 12710 cpansand cwd VDIR
18446735278340920664 0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1
Perl often hangs in a directory during a build of a CPAN module. If I try to
switch to this directory, my current shell will hang too:
$ cd /home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1
[hangs]
$ find /home/cpansand/
/home/cpansand/
/home/cpansand/.login_conf
/home/cpansand/.cshrc
/home/cpansand/.rhosts
/home/cpansand/.shrc
/home/cpansand/.mail_aliases
/home/cpansand/.profile
/home/cpansand/.mailrc
/home/cpansand/.login
/home/cpansand/.ssh
/home/cpansand/.ssh/authorized_keys
/home/cpansand/.ssh/known_hosts
/home/cpansand/.zshrc
/home/cpansand/bin
/home/cpansand/src
/home/cpansand/.cpan
/home/cpansand/.cpan/build
/home/cpansand/.cpan/build/2017112215
/home/cpansand/.cpan/build/2017112212
[ hangs]
$ top -b 300 | egrep ufs
12358 cpansand 1 20 10 268M 12980K ufs 0:04 0.00% perl
12710 cpansand 1 20 10 21412K 13808K ufs 0:00 0.00% perl5.26.0
56111 root 1 20 0 7752K 3180K ufs 0:00 0.00% bash
56077 root 1 52 0 6332K 2312K ufs 0:00 0.00% find
57160 root 1 20 0 6332K 1792K ufs 0:00 0.00% find
57169 root 1 20 0 6332K 1792K ufs 0:00 0.00% find
12719 cpansand 1 20 10 7060K 1472K ufs 0:00 0.00% sh
$ pstree
|-+= 00675 root /usr/sbin/cron -s
| \-+- 55939 root cron: running job (cron)
| \-+= 55941 root /bin/sh - /usr/sbin/periodic daily
| \-+- 55943 root lockf -t 0 /var/run/periodic.daily.lock /bin/sh
/usr/sbin/periodic LOCKED daily
| \-+- 55944 root /bin/sh /usr/sbin/periodic LOCKED daily
| |-+- 55953 root /bin/sh /usr/sbin/periodic LOCKED daily
| | \-+- 56059 root /bin/sh /etc/periodic/daily/450.status-security
| | \-+- 56060 root /bin/sh - /usr/sbin/periodic security
| | \-+- 56062 root lockf -t 0 /var/run/periodic.security.lock
/bin/sh /usr/sbin/periodic LOCKED security
| | \-+- 56063 root /bin/sh /usr/sbin/periodic LOCKED security
| | |-+- 56071 root /bin/sh /usr/sbin/periodic LOCKED security
| | | \-+- 56073 root /bin/sh -
/etc/periodic/security/100.chksetuid
| | | |--- 56077 root find -sx / /dev/null ( ! -fstype local
) -prune -o -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u+s
-or -perm -g+s ) -exec ls -liTd {} +
| | | \-+- 56078 root /bin/sh -
/etc/periodic/security/100.chksetuid
| | | \--- 56080 root cat
\--= 12358 cpansand /usr/perl5.26.0p/bin/perl5.26.0 (CPAN.pm test
J/JO/JOHNH/Fsdb-2.64.tar.gz) (perl)
|-+- 12710 cpansand /usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t
| \--- 12719 cpansand sh -c blib/script/cgi_to_db TEST/cgi_to_db_ex.in
TEST/cgi_to_db_ex.in-2 < TEST/cgi_to_db_ex.in 2>&1
root 56077 0.0 0.2 6332 1936 - D 03:01 0:00.02 find -sx
/ /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g+x
-or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} +
cpansand 12710 0.0 1.4 21412 13780 0 DN+ 14:11 0:00.10
/usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t
$ ps -o wchan -auxgww | grep ufs
- root 19 0.0 0.0 0 16 - DL Wed02 0:03.12
[bufspacedaemon]
ufs root 56077 0.0 0.2 6332 1936 - D 03:01 0:00.02
find -sx / /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or
-perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} +
ufs cpansand 12358 0.0 0.7 274408 7280 0 DNs+ 14:11 0:03.89
/usr/perl5.26.0p/bin/perl5.26.0 (CPAN.pm test J/JO/JOHNH/Fsdb-2.64.tar.gz)
(perl)
ufs cpansand 12710 0.0 1.4 21412 13780 0 DN+ 14:11 0:00.10
/usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t
ufs cpansand 12719 0.0 0.1 7060 1444 0 DN+ 14:11 0:00.00
sh -c blib/script/cgi_to_db TEST/cgi_to_db_ex.in TEST/cgi_to_db_ex.in-2 <
TEST/cgi_to_db_ex.in 2>&1
- root 78756 0.0 0.0 412 304 7 R+ 04:18 0:00.00
grep ufs
ufs root 56111 0.0 0.3 7752 2688 8 Ds+ 03:03 0:00.02
-bash (bash)
ufs root 57160 0.0 0.1 6332 1416 10 D+ 03:15 0:00.00
find .
ufs root 57169 0.0 0.1 6332 1416 11 D+ 03:15 0:00.00
find /home/cpansand/
The perl interpreter usually crashed while testing the CPAN module
J/JO/JOHNH/Fsdb-2.64.tar.gz in test t/test_command.t
But I think this is a red herring. The module just runs fine if you test only
this module. It must be a combination of other, earlier test which finally
bring Fsdb-2.64.tar.gz down.
We saw this error on FreeBSD 9.x, 10.3, 11.1 Release, 11-stable, 12-current.
My gut feeling is that we have a deadlock or forgotten lock of a directory/file
in the ufs code. Some kernel thread forgot to release a lock. Any other process
which need to open this directory will hang, and soon or later (2-4 hours) the
machine crashes hard.
Everytime the machine crashed we had to run fsck twice, repair inodes and the
superblock on the console manually. This is not fun.
We saw the errors at different cloud providers. I’m pretty sure it is not a bad
setup, it is a general problem of the FreeBSD kernel/ufs filesystem.
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list