sleep(3) sometimes too sleepy on FreeBSD 8.0?

John Marshall john.marshall at riverwillow.com.au
Tue Feb 23 01:35:29 UTC 2010


Environment: sendmail 8.14.4 on FreeBSD 8.0-RELEASE-p2

Since upgrading a few local servers to FreeBSD 8.0-RELEASE (and
subsequently 8.0-RELEASE-p2), I have been seeing VERY intermittent
problems with sendmail persistent queue runners.  One or more queue
runners will fail to wake up (having been told to sleep for either 1 or
5 seconds) and mail accumulates in their queue group queues.

I have only seen this about 4 times but at least once on each of the
three 8.0 servers.  I've been seeing something like one occurrence per
fortnight overall.  The first few times I re-started sendmail.  On
Saturday I spent longer looking at it.

 - attached to each of the stuck queue runner processes via gdb to
   try to see where they were stuck
 - backtraces from both process were identical and looked sane
 - attached to a happy queue runner process and got an identical
   backtrace
 - exited gdb and discovered that the stuck queue runners had woken
   up and flushed their queues!

The stuck queue runner processes had been stuck for several hours
(judging by the timestamps on the queued mail messages) but the gdb
attach apparently woke them up!

PROCESS STATES BEFORE DEBUG (stuck runners are in 'I' state)

  PID  TT  STAT      TIME COMMAND
80298  ??  Ss     0:17.68 sendmail: accepting connections (sendmail)
80299  ??  I      0:46.62 sendmail: running queue: /var/spool/mqueue/qd1/df (sendmail)
80300  ??  I      0:08.83 sendmail: running queue: /var/spool/mqueue/mby/df (sendmail)
80301  ??  S      0:31.58 sendmail: running queue: /var/spool/mqueue/oz/df (sendmail)
80302  ??  S      0:30.71 sendmail: running queue: /var/spool/mqueue/rw2/df (sendmail)
80303  ??  S      0:33.29 sendmail: running queue: /var/spool/mqueue/hold/df (sendmail)
80304  ??  S      0:30.55 sendmail: running queue: /var/spool/mqueue/pgp/df (sendmail)

BACKTRACE OF STUCK PROCESS 80299

(gdb) bt
#0  0x28346547 in sigsuspend () from /lib/libc.so.7
#1  0x28344e98 in sigpause () from /lib/libc.so.7
#2  0x2833be3e in pause () from /lib/libc.so.7
#3  0x080cc7c8 in sleep ()
#4  0x08099c51 in run_work_group ()
#5  0x08099ebf in runqueue ()
#6  0x0805538d in main ()

BACKTRACE OF HAPPY PROCESS 80301

(gdb) bt
#0  0x28346547 in sigsuspend () from /lib/libc.so.7
#1  0x28344e98 in sigpause () from /lib/libc.so.7
#2  0x2833be3e in pause () from /lib/libc.so.7
#3  0x080cc7c8 in sleep ()
#4  0x08099c51 in run_work_group ()
#5  0x08099ebf in runqueue ()
#6  0x0805538d in main ()

PROCESS STATES AFTER DEBUG

  PID  TT  STAT      TIME COMMAND
80298  ??  Ss     0:17.69 sendmail: accepting connections (sendmail)
80299  ??  S      0:46.66 sendmail: running queue: /var/spool/mqueue/qd1/df (sendmail)
80300  ??  S      0:08.85 sendmail: running queue: /var/spool/mqueue/mby/df (sendmail)
80301  ??  S      0:31.60 sendmail: running queue: /var/spool/mqueue/oz/df (sendmail)
80302  ??  S      0:30.73 sendmail: running queue: /var/spool/mqueue/rw2/df (sendmail)
80303  ??  S      0:33.32 sendmail: running queue: /var/spool/mqueue/hold/df (sendmail)
80304  ??  S      0:30.58 sendmail: running queue: /var/spool/mqueue/pgp/df (sendmail)

SENDMAIL DETAILS

Version 8.14.4
 Compiled with: DNSMAP LOG MAP_REGEX MATCHGECOS MILTER MIME7TO8 MIME8TO7
		NAMED_BIND NETINET NETUNIX NEWDB NIS PIPELINING SASLv2 SCANF
		STARTTLS USERDB XDEBUG

/usr/sbin/sendmail:
	libsasl2.so.2 => /usr/local/lib/libsasl2.so.2 (0x28154000)
	libssl.so.7 => /usr/local/lib/libssl.so.7 (0x2816a000)
	libcrypto.so.7 => /usr/local/lib/libcrypto.so.7 (0x281ad000)
	libutil.so.8 => /lib/libutil.so.8 (0x282f2000)
	libc.so.7 => /lib/libc.so.7 (0x28300000)
	libz.so.5 => /lib/libz.so.5 (0x2840c000)

I posted about this in comp.mail.sendmail and was told...

> sleep() should be one of these calls:
> 
>         if (njobs == 0 && WorkGrp[wgrp].wg_lowqintvl < MIN_SLEEP_TIME)
>                 sleep(MIN_SLEEP_TIME);
>         else if (WorkGrp[wgrp].wg_lowqintvl <= 0)
>                 sleep(QueueIntvl > 0 ? QueueIntvl : MIN_SLEEP_TIME);
>         else
>                 sleep(WorkGrp[wgrp].wg_lowqintvl);
> 
> Unless you have a really large value for one of these, the process
> should continue after a while.

The above code snippet is from sendmail/queue.c which fixes
MIN_SLEEP_TIME at 5.  QueueIntvl defaults to 1.  wg_lowqintvl defaults
to 0.  I have not set any configuration or runtime options to override
these defaults, so my persistent queue runners should be sleeping for
either 1s or 5s only (not hours!).

-- 
John Marshall
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100223/0abe5c60/attachment.pgp


More information about the freebsd-stable mailing list