kern/104406: Processes get stuck in "ufs" state under persistent
CPU load
Sergey Zaharchenko
doublef-ctm at yandex.ru
Sat Oct 14 06:00:32 PDT 2006
>Number: 104406
>Category: kern
>Synopsis: Processes get stuck in "ufs" state under persistent CPU load
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Sat Oct 14 13:00:30 GMT 2006
>Closed-Date:
>Last-Modified:
>Originator: Sergey Zaharchenko
>Release: FreeBSD 7.0-CURRENT i386
>Organization:
Volgograd State Technical University
>Environment:
System: FreeBSD shark.localdomain 7.0-CURRENT FreeBSD 7.0-CURRENT #5: Fri Oct 13 22:03:33 MSD 2006 root at shark.localdomain:/var/obj/src/usr.src/sys/GENERIC i386
The problem has also been observed on 7.0-CURRENT of August 2006.
FWIW 4.8-RELEASE didn't have the problem.
CPU: AMD Sempron(tm) 2500+ (1753.99-MHz 686-class CPU)
A UP system, GENERIC kernel, no RAID, etc.
>Description:
When a single process loads the CPU for a long(*) time, other processes
which want to access to the filesystem get stuck in the "ufs" state when
trying to do that. Other processes which don't need to access the
filesystem (like top, etc.) proceed normally.
The owner UID, nice- and idprio- status of the offending (or offended)
process do not matter.
It seems essential that a single process is working all the time (e.g.
two hours of compilation don't show up any errors, because there are
many processes).
Example top outputs for this situation:
last pid: 9798; load averages: 2.00, 1.96, 1.72 up 0+02:29:59 11:02:17
113 processes: 3 running, 109 sleeping, 1 zombie
CPU states: 0.0% user, 96.2% nice, 0.4% system, 3.4% interrupt, 0.0% idle
Mem: 134M Active, 193M Inact, 88M Wired, 316K Cache, 59M Buf, 70M Free
Swap: 4097M Total, 64M Used, 4033M Free, 1% Inuse
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
2661 df 1 139 20 63532K 58468K RUN 107:21 97.46% generic_slave
8035 df 1 96 0 7608K 2752K select 0:00 0.05% mc
702 root 1 96 0 5200K 792K select 0:41 0.00% syslogd
1245 root 1 -4 0 5152K 572K ufs 0:35 0.00% tail
912 squid 1 -4 0 13928K 4288K ufs 0:31 0.00% squid
1163 df 1 -4 0 9716K 2512K ufs 0:27 0.00% fetchmail
2600 root 1 -32 0 5516K 2052K RUN 0:13 0.00% top
1042 mysql 6 20 0 59128K 1956K kserel 0:06 0.00% mysqld
9338 df 1 -4 0 15988K 9812K ufs 0:06 0.00% links
1179 root 1 96 0 5200K 112K select 0:02 0.00% moused
last pid: 2739; load averages: 2.00, 1.95, 1.57 up 0+00:31:39 16:46:48
91 processes: 3 running, 87 sleeping, 1 zombie
CPU states: 97.8% user, 0.0% nice, 0.4% system, 1.9% interrupt, 0.0% idle
Mem: 120M Active, 29M Inact, 32M Wired, 17M Cache, 15M Buf, 287M Free
Swap: 4097M Total, 4097M Free
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
1352 df 1 132 0 2248K 896K RUN 25:58 97.41% testcase
1580 df 1 96 0 7608K 2696K select 0:09 0.00% mc
970 squid 1 -4 0 13928K 8108K ufs 0:03 0.00% squid
1316 root 1 -32 0 5516K 1608K RUN 0:03 0.00% top
1238 df 1 -4 0 7668K 1884K ufs 0:02 0.00% fetchmail
1110 mysql 6 20 0 59128K 51600K kserel 0:01 0.00% mysqld
754 root 1 96 0 5200K 1088K select 0:01 0.00% syslogd
1322 root 1 96 0 22656K 16968K select 0:01 0.00% Xorg
868 root 1 8 0 9232K 4748K nanslp 0:00 0.00% httpd
1126 news 1 8 0 5484K 1308K wait 0:00 0.00% sh
1317 root 1 -4 0 5152K 684K ufs 0:00 0.00% tail
1144 news 1 4 4 7816K 3396K sbwait 0:00 0.00% perl5.8.8
1120 news 1 -4 0 150M 12828K ufs 0:00 0.00% innd
1351 df 1 20 0 4032K 1784K pause 0:00 0.00% csh
993 squid 1 -4 0 5640K 824K msgwai 0:00 0.00% diskd
1699 df 1 8 0 5244K 3104K ppwait 0:00 0.00% csh
1182 root 1 8 0 5200K 1100K nanslp 0:00 0.00% cron
(*) for values of `long' from 10 minutes to 2 hours for me.
>How-To-Repeat:
A program to generate the necessary load can be quite simple, like
int
main(void)
{
/* Crunch some numbers (really meaningless) */
unsigned u=1;
while (1)
{
u*=0x8088405;
}
}
Compile and run it, run `top', and wait for a long (see above) time.
Browse directories with `ls' from time to time on a different terminal.
See `ls' hang at some time. View the `top' terminal.
>Fix:
I don't know the fix, but an offending process can be stopped with 'kill
-STOP' and continued with 'kill -CONT', which allows other processes to
access the filesystem (until another such failure occurs). Periodic
stopping and starting processes might count as a lousy workaround.
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list