[Bug 193457] New: Enabling RACCT/RCTL causes system hang
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Mon Sep 8 08:34:43 UTC 2014
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193457
Bug ID: 193457
Summary: Enabling RACCT/RCTL causes system hang
Product: Base System
Version: 9.3-RELEASE
Hardware: amd64
OS: Any
Status: Needs Triage
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: freebsd-bugs at FreeBSD.org
Reporter: j.david.lists at gmail.com
When compiling a 9.3 kernel with the RACCT & RCTL options for rctl, the
resulting system hangs repeatably.
To reproduce this, create a persistent jail and run some processes in it.
Nothing needs to be set up for rctl limits.
Once there are no more processes in the jail, run rctl on the jail:
# rctl -hu jail:jailname
The entire system will hang.
db> show threads
100285 (0xfffffe0040a8a900) (stack 0xffffff812beaa000) sched_switch() at
sched_switch+0x250/frame 0xffffff812bead770
... probably over 100 more sched_switch() lines
100046 (0xfffffe0002d03000) (stack 0xffffff81222a8000) sched_switch() at
sched_switch+0x250/frame 0xffffff81222abb60
100045 (0xfffffe0002d03480) (stack 0xffffff81222a3000) fork_trampoline() at
fork_trampoline
100044 (0xfffffe0002d03900) (stack 0xffffff812229e000) sched_switch() at
sched_switch+0x250/frame 0xffffff81222a1b60
100041 (0xfffffe0002d05900) (stack 0xffffff812228f000) sched_switch() at
sched_switch+0x250/frame 0xffffff8122292b60
100040 (0xfffffe0002d06000) (stack 0xffffff812228a000) sched_switch() at
sched_switch+0x250/frame 0xffffff812228db60
100039 (0xfffffe0002c01000) (stack 0xffffff8122285000) fork_trampoline() at
fork_trampoline
100036 (0xfffffe0002c0b000) (stack 0xffffff80f77ec000) sched_switch() at
sched_switch+0x250/frame 0xffffff80f77efb60
100035 (0xfffffe0002c0b480) (stack 0xffffff80f77e7000) sched_switch() at
sched_switch+0x250/frame 0xffffff80f77eab60
100034 (0xfffffe0002c0b900) (stack 0xffffff80f77e2000) fork_trampoline() at
fork_trampoline
100033 (0xfffffe0002c0c000) (stack 0xffffff80f77d5000) sched_switch() at
sched_switch+0x250/frame 0xffffff80f77d8b60
100032 (0xfffffe0002c0c480) (stack 0xffffff80f77d0000) fork_trampoline() at
fork_trampoline
100026 (0xfffffe0002b28900) (stack 0xffffff80f77ac000) fork_trampoline() at
fork_trampoline
100025 (0xfffffe0002b29000) (stack 0xffffff80002f5000) fork_trampoline() at
fork_trampoline
100024 (0xfffffe0002b29480) (stack 0xffffff80002cd000) fork_trampoline() at
fork_trampoline
100023 (0xfffffe0002b29900) (stack 0xffffff80002a5000) sched_switch() at
sched_switch+0x250/frame 0xffffff80002a8b60
100022 (0xfffffe0002b23000) (stack 0xffffff80002a0000) sched_switch() at
sched_switch+0x250/frame 0xffffff80002a3b60
100016 (0xfffffe0002b26000) (stack 0xffffff8000282000) fork_trampoline() at
fork_trampoline
100014 (0xfffffe000296f480) (stack 0xffffff8000278000) sched_switch() at
sched_switch+0x250/frame 0xffffff800027bb60
100008 (0xfffffe000295c900) (stack 0xffffff800025a000) fork_trampoline() at
fork_trampoline
100007 (0xfffffe000296c000) (stack 0xffffff8000255000) sched_switch() at
sched_switch+0x250/frame 0xffffff8000258b60
100006 (0xfffffe000296c480) (stack 0xffffff8000250000) sched_switch() at
sched_switch+0x250/frame 0xffffff8000253b60
100005 (0xfffffe000296c900) (stack 0xffffff800024b000) sched_switch() at
sched_switch+0x250/frame 0xffffff800024eb60
100004 (0xfffffe000295b000) (stack 0xffffff8000246000) cpustop_handler() at
cpustop_handler+0x27/frame 0xffffff8000233cf0
100003 (0xfffffe000295b480) (stack 0xffffff8000241000) kdb_break() at
kdb_break+0x55/frame 0xffffff8000244990
100002 (0xfffffe000295b900) (stack 0xffffff800023c000) sched_switch() at
sched_switch+0x250/frame 0xffffff800023f720
... another dozen or so sched_switch() threads
100000 (0xffffffff8152eb80) (stack 0xffffffff817ff000) sched_switch() at
sched_switch+0x250/frame 0xffffffff81802bb0
db> show all procs
pid ppid pgrp uid state wmesg wchan cmd
7255 645 7255 106 DsJ allpriso 0xffffffff8152f340 jail_start
7252 7233 7233 0 D allproc 0xffffffff815326d0 php
7233 7231 7233 0 Ss wait 0xfffffe004070f000 sh
7231 709 709 0 S piperd 0xfffffe0002f312d8 cron
1634 1616 1634 1000 Ds+ allproc 0xffffffff815326d0 bash
1616 1527 1527 1000 S select 0xfffffe0040a60dc0 sshd
1527 689 1527 0 Ss select 0xfffffe00405656c0 sshd
937 645 937 25000 SsJ select 0xfffffe0002f1c740 python2.7
931 632 627 0 S select 0xfffffe00409a7740 php
929 0 0 0 DL proctree 0xffffffff81532658 [newnfs 1]
928 0 0 0 DL proctree 0xffffffff81532658 [newnfs 0]
925 924 925 1000 Ss+ ttyin 0xfffffe0002cf4ca8 bash
924 921 921 1000 S select 0xfffffe0002ded640 sshd
921 689 921 0 Ss select 0xfffffe00403f12c0 sshd
860 643 627 25000 S (threaded) httpd
100260 S accept 0xfffffe004041e30e httpd
100259 S uwait 0xfffffe00409a9280 httpd
.... lots of httpd's waiting around, probably not relevant ....
100132 S uwait 0xfffffe0040567500 httpd
100130 S piperd 0xfffffe0040757888 httpd
855 643 627 0 S piperd 0xfffffe0002f08b60 apache_log_client
854 643 627 65534 S piperd 0xfffffe0040757b60 clogd
852 634 627 59 S kqread 0xfffffe00408a9300 unbound
751 1 751 0 Ss+ ttyin 0xfffffe0002cf78a8 getty
750 1 750 0 Ss+ ttyin 0xfffffe0002d0d4a8 getty
749 1 749 0 Ss+ ttyin 0xfffffe0002d0d8a8 getty
748 1 748 0 Ss+ ttyin 0xfffffe0002d0f0a8 getty
747 1 747 0 Ss+ ttyin 0xfffffe0002d0f4a8 getty
746 1 746 0 Ss+ ttyin 0xfffffe0002d0fca8 getty
745 1 745 0 Ss+ ttyin 0xfffffe0002d104a8 getty
744 1 744 0 Ss+ ttyin 0xfffffe0002d100a8 getty
743 1 743 0 Ss+ ttyin 0xfffffe0002d0f8a8 getty
709 1 709 0 Ss nanslp 0xffffffff81499470 cron
705 1 705 25 Ss pause 0xfffffe004043c548 sendmail
701 1 701 0 Ss select 0xfffffe0040565cc0 sendmail
689 1 689 0 Ss select 0xfffffe00402581c0 sshd
656 1 656 181 Ss select 0xfffffe0040258ac0 nrpe2
643 637 627 0 D proctree 0xffffffff81532658 httpd
639 633 627 0 S select 0xfffffe00409a8340 ntpd
638 629 627 0 S select 0xfffffe0002f1cdc0 supervise
637 629 627 0 S select 0xfffffe0002d857c0 supervise
636 629 627 0 S select 0xfffffe00403f15c0 supervise
634 629 627 0 S select 0xfffffe0002f1a4c0 supervise
633 629 627 0 S select 0xfffffe0002f1bb40 supervise
632 629 627 0 S select 0xfffffe0002f1a240 supervise
630 1 627 0 S piperd 0xfffffe00401cf000 readproctitle
629 1 627 0 D proctree 0xffffffff81532658 svscan
571 1 571 0 Ss rpcsvc 0xfffffe0002f1c6a8 NLM: master
568 1 568 0 Ss select 0xfffffe0002f1c3c0 rpc.statd
545 0 0 0 DL mdwait 0xfffffe004020d800 [md0]
534 1 534 0 Ss select 0xfffffe0002f1a6c0 amd
523 1 523 0 Ss select 0xfffffe0002dee4c0 rpcbind
497 1 497 0 Ss select 0xfffffe0002f1bbc0 syslogd
379 1 379 0 Ss select 0xfffffe0002f1a840 devd
20 0 0 0 DL - 0xffffffff81498320 [schedcpu]
19 0 0 0 DL - 0xffffffff81498320 [racctd]
18 0 0 0 DL sdflush 0xffffffff814c9594 [softdepflush]
17 0 0 0 DL syncer 0xffffffff814c2998 [syncer]
16 0 0 0 DL vlruwt 0xfffffe0002d8c000 [vnlru]
9 0 0 0 DL psleep 0xffffffff814c25bc [bufdaemon]
8 0 0 0 DL pgzero 0xffffffff814cae5c [pagezero]
7 0 0 0 DL pollid 0xffffffff81497e70 [idlepoll]
6 0 0 0 DL psleep 0xffffffff814ca0a0 [vmdaemon]
5 0 0 0 DL psleep 0xffffffff81541e84 [pagedaemon]
4 0 0 0 DL - 0xffffffff81460130 [xpt_thrd]
3 0 0 0 DL waiting_ 0xffffffff81535d48 [sctp_iterator]
2 0 0 0 DL pftm 0xffffffff80336bc0 [pfpurge]
15 0 0 0 DL (threaded) [usb]
100030 D - 0xffffff8000765ef0 [usbus0]
100029 D - 0xffffff8000765e98 [usbus0]
100028 D - 0xffffff8000765e40 [usbus0]
100027 D - 0xffffff8000765de8 [usbus0]
14 0 0 0 DL - 0xffffffff81498320 [yarrow]
13 0 0 0 DL (threaded) [geom]
100011 D - 0xffffffff8152e5d8 [g_down]
100010 D - 0xffffffff8152e5d0 [g_up]
100009 D - 0xffffffff8152e5c8 [g_event]
12 0 0 0 WL (threaded) [intr]
100046 I [swi0: uart]
100045 I [irq12: psm0]
100044 I [irq1: atkbd0]
100041 I [irq263:
virtio_pci3]
100040 I [irq262:
virtio_pci3]
100039 I [irq261:
virtio_pci3]
100036 I [irq260:
virtio_pci2]
100035 I [irq259:
virtio_pci2]
100034 I [irq258:
virtio_pci2]
100033 I [irq257:
virtio_pci1]
100032 I [irq256:
virtio_pci1]
100026 I [irq11: uhci0+]
100025 I [irq15: ata1]
100024 I [irq14: ata0]
100023 I [swi6: task queue]
100022 I [swi2: cambio]
100016 I [swi5: fast taskq]
100014 I [swi6: Giant
taskq]
100008 I [swi3: vm]
100007 I [swi1: netisr 0]
100006 I [swi4: clock]
100005 I [swi4: clock]
11 0 0 0 RL (threaded) [idle]
100004 Run CPU 1 [idle: cpu1]
100003 Run CPU 0 [idle: cpu0]
1 0 1 0 SLs wait 0xfffffe0002958950 [init]
10 0 0 0 DL audit_wo 0xffffffff81539480 [audit]
0 0 0 0 DLs (threaded) [kernel]
100047 D - 0xfffffe0002d1dc00 [mca taskq]
100043 D - 0xfffffe0002ca8000 [vtnet1 txq 0]
100042 D - 0xfffffe0002ca8080 [vtnet1 rxq 0]
100038 D - 0xfffffe0002c0aa80 [vtnet0 txq 0]
100037 D - 0xfffffe0002c0ab00 [vtnet0 rxq 0]
100031 D vtbslp 0xfffffe0002c00080 [virtio_balloon]
100021 D - 0xfffffe0002b0b380 [acpi_task_2]
100020 D - 0xfffffe0002b0b380 [acpi_task_1]
100019 D - 0xfffffe0002b0b380 [acpi_task_0]
100018 D - 0xfffffe0002b0b400 [kqueue taskq]
100017 D - 0xfffffe0002b0b480 [ffs_trim taskq]
100015 D - 0xfffffe0002b0c080 [thread taskq]
100012 D - 0xfffffe0002970980 [firmware taskq]
100000 D swapin 0xffffffff8152e6c8 [swapper]
The stuck processes above are those that show state "D" with allproc, proctree,
or allpriso.
Here is the info about racctd:
db> show proc 19
Process 19 (racctd) at 0xfffffe0002d89000:
state: NORMAL
uid: 0 gids: 0
parent: pid 0 at 0xffffffff8152e6c8
ABI: null
threads: 1
100059 D - 0xffffffff81498320 [racctd]
And here are the stack traces for some stuck processes:
db> trace 1634
Tracing pid 1634 tid 100074 td 0xfffffe0002ea4480
sched_switch() at sched_switch+0x250/frame 0xffffff812bab1930
mi_switch() at mi_switch+0xe1/frame 0xffffff812bab1970
sleepq_wait() at sleepq_wait+0x3a/frame 0xffffff812bab19a0
_sx_xlock_hard() at _sx_xlock_hard+0x4b9/frame 0xffffff812bab1a50
fork1() at fork1+0x31e/frame 0xffffff812bab1b00
sys_fork() at sys_fork+0x19/frame 0xffffff812bab1b20
amd64_syscall() at amd64_syscall+0x338/frame 0xffffff812bab1c30
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff812bab1c30
--- syscall (2, FreeBSD ELF64, sys_fork), rip = 0x800bb308c, rsp =
0x7fffffffe4e8, rbp = 0x7fffffffe530 ---
db> trace 7255
Tracing pid 7255 tid 100285 td 0xfffffe0040a8a900
sched_switch() at sched_switch+0x250/frame 0xffffff812bead770
mi_switch() at mi_switch+0xe1/frame 0xffffff812bead7b0
sleepq_wait() at sleepq_wait+0x3a/frame 0xffffff812bead7e0
_sx_xlock_hard() at _sx_xlock_hard+0x4b9/frame 0xffffff812bead890
kern_jail_set() at kern_jail_set+0x3eb8/frame 0xffffff812beadaf0
sys_jail_set() at sys_jail_set+0x41/frame 0xffffff812beadb20
amd64_syscall() at amd64_syscall+0x338/frame 0xffffff812beadc30
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff812beadc30
--- syscall (507, FreeBSD ELF64, sys_jail_set), rip = 0x80191ab1c, rsp =
0x7fffffffe268, rbp = 0x7fffffffe310 ---
So it looks like racctd is holding onto a lock that is deadlocking any other
thread or process from touching the process or jail tables. The underlying
cause is probably related to racctd trying to handle the empty jail.
This issue is extremely reproducible for us, so if any other information would
be helpful, please let me know.
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list