misc/166340: Process under FreeBSD 9.0 hangs in uninterruptable
sleep with apparently no syscall (empty wchan)
Konstantin Belousov
kostikbel at gmail.com
Tue Mar 27 17:50:17 UTC 2012
The following reply was made to PR kern/166340; it has been noted by GNATS.
From: Konstantin Belousov <kostikbel at gmail.com>
To: Christian Esken <Christian.Esken at trivago.com>
Cc: bug-followup at freebsd.org, avg at freebsd.org
Subject: Re: misc/166340: Process under FreeBSD 9.0 hangs in uninterruptable sleep with apparently no syscall (empty wchan)
Date: Tue, 27 Mar 2012 20:46:26 +0300
--KldKAdupQSLqpq2E
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Tue, Mar 27, 2012 at 05:30:48PM +0200, Christian Esken wrote:
> Konstantin Belousov wrote:
> > Thank you for the data. Semi-obviously, the callout_stop() call in
> > sleepq_check_timeout() have to return 0, otherwise we would not call
> > mi_switch() there. But I do not see how this can happen, because
> > the callout state, printed from kgdb, still indicates that callout
> > is pending. Callout cannot be reset while in sleepq code.
> >=20
> > So there are two possible routes to go forward: preferrable is for
> > you to extract the self-contained C program that would illustrate
> > the issue and send this sample to me. Second is to recompile your
> > kernel with INVARIANTS/WITNESS and possibly KTR and see what happen.
>=20
> I repeated the test with INVARIANTS/WITNESS and KTR compiled in
> (actually WITNESS was already included during the last test).
>=20
> I ran KTR with nothing filtered out, and formatted the dump with
> "ktrdump -cftH -i ktr.out". The whole log is excessive (1GB), so
> I have extrated two short sections (see attachment).
>=20
> The first section shows the last action of the application, namely a
> succselful sendto() to a TCP socket, and then waiting for an answer via
> recvfrom().
> The second section illustrates the lock/unlock sequence of the sleep
> mutex for the recfrom(). It goes like LOCK, LOCK, UNLOCK.
>=20
> This time the signal status is different. We have a pending signal:
> USER PID PPID PENDING CAUGHT IGNORED BLOCKED STAT WCHAN
> nobody 9163 1 4000 80005006 79f88010 0 D - =20
>=20
> Looks like SIGPROF (27). Just wondering where it comes from.
>=20
This is irrelevant, and probably red-herring. The issue there is failing
callout_stop() while callout seems to be still pending. Also, mask 0x4000
of the pending signals indicates that SIGTERM is pending, not SIGPROF.
I probably want the data from your ktr dump, either all entries for
the stuck process and all entries for facility CALLOUT, or just the
whole dump.
Last entries of your log shred do not make much sense, since the process
must enter _sleep() function which logs this fact right after locking
sleepq. But log ends on so_rcv mutex lock.
Please, when collecting the data, collect the whole set, i.e.
include procstat -kk <pid> output together with the ktr, as well as kgdb
output, so that I can be sure that we chasing one, and not N bugs.
--KldKAdupQSLqpq2E
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)
iEYEARECAAYFAk9x/PIACgkQC3+MBN1Mb4hbeACfYyUTEE5GV/SeDO4fNf4ErfHY
27oAoIGj2TMOBtQRi5P+q/v+nrKOFhFb
=0tFs
-----END PGP SIGNATURE-----
--KldKAdupQSLqpq2E--
More information about the freebsd-bugs
mailing list