[Bug 224292] processes are hanging in state ufs / possible deadlock in file system

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Tue Dec 12 19:02:31 UTC 2017


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224292

            Bug ID: 224292
           Summary: processes are hanging in state ufs / possible deadlock
                    in file system
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs at FreeBSD.org
          Reporter: wosch at FreeBSD.org

A friend and I are testing perl modules on FreeBSD, on all the major versions
(12-current, 11-stable, 11-RELEASE). The tests are done in a cheap VM (1GB RAM,
40GB SSD, 1 core), apparently running on virtio/kvm.

While testing we noticed that the VM randomly crashes every 1-3 days, and we
had to run a hard fsck on the console. I first thought it is a resource
problem, but munin doesn’t show any shortage of RAM, disk or file descriptors.

We find out that the crashes occurred shortly after processes hanging in state
“ufs”, as shown in top(1) output:

$ top -b 300 | egrep ufs
73278 root          1  31    0  8364K  2340K ufs      0:02  10.35% find
34548 cpansand      1  52   10 43828K 15348K ufs      0:00   0.00% perl
34824 cpansand      1  20    0 21060K  1960K ufs      0:00   0.00% procstat
34557 cpansand      1  52   10 13180K  2112K ufs      0:00   0.00% sh


$ ps xuawww | egrep 73278  
root       73278   0.0  0.2   8364   2340  -  D    03:01       0:02.25 find -sx
/ /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g+x
-or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} +
root       73667   0.0  0.0    412    312  8  R+   03:08       0:00.00 egrep
73278

$ ps xuawww | egrep 34548  
cpansand   34548   0.0  1.5  43828  15348  4  DN+  15:29       0:00.10
/usr/perl5.26.1p/bin/perl t/test_command.t


Another example:

$ lsof  | egrep 12710
perl5.26. 12710   cpansand  cwd     VDIR                   
18446735278340920664    0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1
perl5.26. 12710   cpansand  rtd     VDIR                    
7507057954083258955    0 / (/dev/vtbd0p3)
perl5.26. 12710   cpansand    0u    VCHR               0,90             
0t20972   90 /dev/pts/0
perl5.26. 12710   cpansand    1u    PIPE 0xfffff8000399f168                   
0      ->0xfffff8000399f000
perl5.26. 12710   cpansand    2u    PIPE 0xfffff8001c30da50                   
0      ->0xfffff8001c30d8e8
perl5.26. 12710   cpansand    3u    PIPE 0xfffff8001c30ebe0               
16384      ->0xfffff8001c30ed48
perl5.26. 12710   cpansand    5u    PIPE 0xfffff8000399f168                   
0      ->0xfffff8000399f000
perl5.26. 12710   cpansand    6u    PIPE 0xfffff8001c30da50                   
0      ->0xfffff8001c30d8e8
perl5.26. 12710   cpansand    7u    PIPE 0xfffff8000399f168                   
0      ->0xfffff8000399f000
perl5.26. 12710   cpansand    8u    PIPE 0xfffff8001c30da50                   
0      ->0xfffff8001c30d8e8


$ top -b 300 | egrep ufs
12358 cpansand      1  20   10   268M 12980K ufs      0:04   0.00% perl
12710 cpansand      1  20   10 21412K 13808K ufs      0:00   0.00% perl5.26.0
56111 root          1  20    0  7752K  3180K ufs      0:00   0.00% bash
56077 root          1  52    0  6332K  2312K ufs      0:00   0.00% find
12719 cpansand      1  20   10  7060K  1472K ufs      0:00   0.00% sh


$ lsof | egrep '12710|12358' | egrep home
perl      12358   cpansand  cwd     VDIR                   
18446735278340920664    0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1
perl5.26. 12710   cpansand  cwd     VDIR                   
18446735278340920664    0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1


Perl often hangs in a directory during a build of a CPAN module. If I try to
switch to this directory, my current shell will hang too:

$ cd /home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1
[hangs]

$ find /home/cpansand/
/home/cpansand/
/home/cpansand/.login_conf
/home/cpansand/.cshrc
/home/cpansand/.rhosts
/home/cpansand/.shrc
/home/cpansand/.mail_aliases
/home/cpansand/.profile
/home/cpansand/.mailrc
/home/cpansand/.login
/home/cpansand/.ssh
/home/cpansand/.ssh/authorized_keys
/home/cpansand/.ssh/known_hosts
/home/cpansand/.zshrc
/home/cpansand/bin
/home/cpansand/src
/home/cpansand/.cpan
/home/cpansand/.cpan/build
/home/cpansand/.cpan/build/2017112215
/home/cpansand/.cpan/build/2017112212
[ hangs]


$ top -b 300 | egrep ufs
12358 cpansand      1  20   10   268M 12980K ufs      0:04   0.00% perl
12710 cpansand      1  20   10 21412K 13808K ufs      0:00   0.00% perl5.26.0
56111 root          1  20    0  7752K  3180K ufs      0:00   0.00% bash
56077 root          1  52    0  6332K  2312K ufs      0:00   0.00% find
57160 root          1  20    0  6332K  1792K ufs      0:00   0.00% find
57169 root          1  20    0  6332K  1792K ufs      0:00   0.00% find
12719 cpansand      1  20   10  7060K  1472K ufs      0:00   0.00% sh


$ pstree
|-+= 00675 root /usr/sbin/cron -s
 | \-+- 55939 root cron: running job (cron)
 |   \-+= 55941 root /bin/sh - /usr/sbin/periodic daily
 |     \-+- 55943 root lockf -t 0 /var/run/periodic.daily.lock /bin/sh
/usr/sbin/periodic LOCKED daily
 |       \-+- 55944 root /bin/sh /usr/sbin/periodic LOCKED daily
 |         |-+- 55953 root /bin/sh /usr/sbin/periodic LOCKED daily
 |         | \-+- 56059 root /bin/sh /etc/periodic/daily/450.status-security
 |         |   \-+- 56060 root /bin/sh - /usr/sbin/periodic security
 |         |     \-+- 56062 root lockf -t 0 /var/run/periodic.security.lock
/bin/sh /usr/sbin/periodic LOCKED security
 |         |       \-+- 56063 root /bin/sh /usr/sbin/periodic LOCKED security
 |         |         |-+- 56071 root /bin/sh /usr/sbin/periodic LOCKED security
 |         |         | \-+- 56073 root /bin/sh -
/etc/periodic/security/100.chksetuid
 |         |         |   |--- 56077 root find -sx / /dev/null ( ! -fstype local
) -prune -o -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u+s
-or -perm -g+s ) -exec ls -liTd {} +
 |         |         |   \-+- 56078 root /bin/sh -
/etc/periodic/security/100.chksetuid
 |         |         |     \--- 56080 root cat


     \--= 12358 cpansand /usr/perl5.26.0p/bin/perl5.26.0 (CPAN.pm test
J/JO/JOHNH/Fsdb-2.64.tar.gz) (perl)

 |-+- 12710 cpansand /usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t
 | \--- 12719 cpansand sh -c blib/script/cgi_to_db TEST/cgi_to_db_ex.in
TEST/cgi_to_db_ex.in-2  < TEST/cgi_to_db_ex.in 2>&1



root       56077   0.0  0.2   6332   1936  -  D    03:01      0:00.02 find -sx
/ /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g+x
-or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} +

cpansand   12710   0.0  1.4  21412 13780  0  DN+  14:11      0:00.10
/usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t


$ ps -o wchan -auxgww | grep ufs    
-        root          19   0.0  0.0      0    16  -  DL   Wed02      0:03.12
[bufspacedaemon]
ufs      root       56077   0.0  0.2   6332  1936  -  D    03:01      0:00.02
find -sx / /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or
-perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} +
ufs      cpansand   12358   0.0  0.7 274408  7280  0  DNs+ 14:11      0:03.89
/usr/perl5.26.0p/bin/perl5.26.0 (CPAN.pm test J/JO/JOHNH/Fsdb-2.64.tar.gz)
(perl)
ufs      cpansand   12710   0.0  1.4  21412 13780  0  DN+  14:11      0:00.10
/usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t
ufs      cpansand   12719   0.0  0.1   7060  1444  0  DN+  14:11      0:00.00
sh -c blib/script/cgi_to_db TEST/cgi_to_db_ex.in TEST/cgi_to_db_ex.in-2  <
TEST/cgi_to_db_ex.in 2>&1
-        root       78756   0.0  0.0    412   304  7  R+   04:18      0:00.00
grep ufs
ufs      root       56111   0.0  0.3   7752  2688  8  Ds+  03:03      0:00.02
-bash (bash)
ufs      root       57160   0.0  0.1   6332  1416 10  D+   03:15      0:00.00
find .
ufs      root       57169   0.0  0.1   6332  1416 11  D+   03:15      0:00.00
find /home/cpansand/


The perl interpreter usually crashed while testing the CPAN module
J/JO/JOHNH/Fsdb-2.64.tar.gz in test t/test_command.t
But I think this is a red herring. The module just runs fine if you test only
this module. It must be a combination of other, earlier test which finally
bring Fsdb-2.64.tar.gz down.

We saw this error on FreeBSD 9.x, 10.3, 11.1 Release, 11-stable, 12-current. 

My gut feeling is that we have a deadlock or forgotten lock of a directory/file
in the ufs code. Some kernel thread forgot to release a lock. Any other process
which need to open this directory will hang, and soon or later (2-4 hours) the
machine crashes hard.

Everytime the machine crashed we had to run fsck twice, repair inodes and the
superblock on the console manually. This is not fun.

We saw the errors at different cloud providers. I’m pretty sure it is not a bad
setup, it is a general problem of the FreeBSD kernel/ufs filesystem.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list