sleep(3) sometimes too sleepy on FreeBSD 8.0?
John Marshall
john.marshall at riverwillow.com.au
Tue Feb 23 01:35:29 UTC 2010
Environment: sendmail 8.14.4 on FreeBSD 8.0-RELEASE-p2
Since upgrading a few local servers to FreeBSD 8.0-RELEASE (and
subsequently 8.0-RELEASE-p2), I have been seeing VERY intermittent
problems with sendmail persistent queue runners. One or more queue
runners will fail to wake up (having been told to sleep for either 1 or
5 seconds) and mail accumulates in their queue group queues.
I have only seen this about 4 times but at least once on each of the
three 8.0 servers. I've been seeing something like one occurrence per
fortnight overall. The first few times I re-started sendmail. On
Saturday I spent longer looking at it.
- attached to each of the stuck queue runner processes via gdb to
try to see where they were stuck
- backtraces from both process were identical and looked sane
- attached to a happy queue runner process and got an identical
backtrace
- exited gdb and discovered that the stuck queue runners had woken
up and flushed their queues!
The stuck queue runner processes had been stuck for several hours
(judging by the timestamps on the queued mail messages) but the gdb
attach apparently woke them up!
PROCESS STATES BEFORE DEBUG (stuck runners are in 'I' state)
PID TT STAT TIME COMMAND
80298 ?? Ss 0:17.68 sendmail: accepting connections (sendmail)
80299 ?? I 0:46.62 sendmail: running queue: /var/spool/mqueue/qd1/df (sendmail)
80300 ?? I 0:08.83 sendmail: running queue: /var/spool/mqueue/mby/df (sendmail)
80301 ?? S 0:31.58 sendmail: running queue: /var/spool/mqueue/oz/df (sendmail)
80302 ?? S 0:30.71 sendmail: running queue: /var/spool/mqueue/rw2/df (sendmail)
80303 ?? S 0:33.29 sendmail: running queue: /var/spool/mqueue/hold/df (sendmail)
80304 ?? S 0:30.55 sendmail: running queue: /var/spool/mqueue/pgp/df (sendmail)
BACKTRACE OF STUCK PROCESS 80299
(gdb) bt
#0 0x28346547 in sigsuspend () from /lib/libc.so.7
#1 0x28344e98 in sigpause () from /lib/libc.so.7
#2 0x2833be3e in pause () from /lib/libc.so.7
#3 0x080cc7c8 in sleep ()
#4 0x08099c51 in run_work_group ()
#5 0x08099ebf in runqueue ()
#6 0x0805538d in main ()
BACKTRACE OF HAPPY PROCESS 80301
(gdb) bt
#0 0x28346547 in sigsuspend () from /lib/libc.so.7
#1 0x28344e98 in sigpause () from /lib/libc.so.7
#2 0x2833be3e in pause () from /lib/libc.so.7
#3 0x080cc7c8 in sleep ()
#4 0x08099c51 in run_work_group ()
#5 0x08099ebf in runqueue ()
#6 0x0805538d in main ()
PROCESS STATES AFTER DEBUG
PID TT STAT TIME COMMAND
80298 ?? Ss 0:17.69 sendmail: accepting connections (sendmail)
80299 ?? S 0:46.66 sendmail: running queue: /var/spool/mqueue/qd1/df (sendmail)
80300 ?? S 0:08.85 sendmail: running queue: /var/spool/mqueue/mby/df (sendmail)
80301 ?? S 0:31.60 sendmail: running queue: /var/spool/mqueue/oz/df (sendmail)
80302 ?? S 0:30.73 sendmail: running queue: /var/spool/mqueue/rw2/df (sendmail)
80303 ?? S 0:33.32 sendmail: running queue: /var/spool/mqueue/hold/df (sendmail)
80304 ?? S 0:30.58 sendmail: running queue: /var/spool/mqueue/pgp/df (sendmail)
SENDMAIL DETAILS
Version 8.14.4
Compiled with: DNSMAP LOG MAP_REGEX MATCHGECOS MILTER MIME7TO8 MIME8TO7
NAMED_BIND NETINET NETUNIX NEWDB NIS PIPELINING SASLv2 SCANF
STARTTLS USERDB XDEBUG
/usr/sbin/sendmail:
libsasl2.so.2 => /usr/local/lib/libsasl2.so.2 (0x28154000)
libssl.so.7 => /usr/local/lib/libssl.so.7 (0x2816a000)
libcrypto.so.7 => /usr/local/lib/libcrypto.so.7 (0x281ad000)
libutil.so.8 => /lib/libutil.so.8 (0x282f2000)
libc.so.7 => /lib/libc.so.7 (0x28300000)
libz.so.5 => /lib/libz.so.5 (0x2840c000)
I posted about this in comp.mail.sendmail and was told...
> sleep() should be one of these calls:
>
> if (njobs == 0 && WorkGrp[wgrp].wg_lowqintvl < MIN_SLEEP_TIME)
> sleep(MIN_SLEEP_TIME);
> else if (WorkGrp[wgrp].wg_lowqintvl <= 0)
> sleep(QueueIntvl > 0 ? QueueIntvl : MIN_SLEEP_TIME);
> else
> sleep(WorkGrp[wgrp].wg_lowqintvl);
>
> Unless you have a really large value for one of these, the process
> should continue after a while.
The above code snippet is from sendmail/queue.c which fixes
MIN_SLEEP_TIME at 5. QueueIntvl defaults to 1. wg_lowqintvl defaults
to 0. I have not set any configuration or runtime options to override
these defaults, so my persistent queue runners should be sleeping for
either 1s or 5s only (not hours!).
--
John Marshall
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100223/0abe5c60/attachment.pgp
More information about the freebsd-stable
mailing list