Stuck CLOSED sockets / sshd / zombies...
Ian Lepore
ian at FreeBSD.org
Wed Apr 2 14:41:16 UTC 2014
On Wed, 2014-04-02 at 15:30 +0100, Karl Pielorz wrote:
> Hi All,
>
> This issue started in -xen (subject: *Stuck sshd in urdlck), moved to
> -stable (subject: sshd with zombie process on FreeBSD 10.0-STABLE), and
> -net (subject: Server sockets staying in CLOSED for extended), but seems to
> have died a death in all of them.
>
> It's affecting a number of people - predominately with sshd.
>
> Does anyone know how I can troubleshoot this further, what the cause / fix
> is, or if it's already actually fixed?
>
> "
> # ps ax | grep 4344
> ps axl | grep 4344
> 0 4344 895 0 20 0 84868 6944 urdlck Is - 0:00.01 sshd: unknown
> [priv] (sshd)
> 22 4345 4344 0 20 0 0 0 - Z - 0:00.00 <defunct>
> 0 4346 4344 0 21 0 84868 6952 sbwait I - 0:00.00 sshd: unknown
> [pam] (sshd)
>
> #ps axd
> ...
> 895 - Is 0:00.05 |-- /usr/sbin/sshd
> 3933 - Is 0:00.01 | |-- sshd: unknown [priv] (sshd)
> 3934 - Z 0:00.00 | | |-- <defunct>
> 3935 - I 0:00.00 | | `-- sshd: unknown [pam] (sshd)
> 4338 - Is 0:00.01 | |-- sshd: unknown [priv] (sshd)
> 4339 - Z 0:00.00 | | |-- <defunct>
> 4340 - I 0:00.00 | | `-- sshd: unknown [pam] (sshd)
> 4341 - Is 0:00.01 | |-- sshd: unknown [priv] (sshd)
> 4342 - Z 0:00.00 | | |-- <defunct>
> 4343 - I 0:00.00 | | `-- sshd: unknown [pam] (sshd)
> 4344 - Is 0:00.01 | |-- sshd: unknown [priv] (sshd)
> 4345 - Z 0:00.00 | | |-- <defunct>
> 4346 - I 0:00.00 | | `-- sshd: unknown [pam] (sshd)
> ...
>
> #netstat -a -n | grep CLOSED | wc -l
> 59
>
> #netstat -a | grep 54544
> tcp4 0 0 192.168.0.138.22 192.168.0.45.54544 CLOSED
>
> #sockstat | grep 4343
> root sshd 4343 3 tcp4 192.168.0.138:22 192.168.0.45:54544
> root sshd 4343 6 stream (not connected)
> root sshd 4343 8 stream -> ??
>
> #uname -a
> FreeBSD host 10.0-STABLE FreeBSD 10.0-STABLE #0 r261289M: Thu Jan 30
> 13:33:35 UTC 2014 x at domain.com:/usr/src/sys/amd64/compile/GENERIC amd64
> "
>
> For a box that's doing nothing (apart from people ssh'ing in occasionally)
> - there's obviously something wrong.
>
> What would be next to try and figure out why this is happening? - as I'd
> dearly like to know what's causing it / a fix (or if it's already fixed in
> -STABLE, and at which revision)
>
> Thanks,
>
> -Karl
I don't know anything about the underlying cause of the stuck sockets or
zombies, but I suspect the thing that triggered the appearance of the
problem was the import of a newer openssh in which the
UsePrivilegeSeparation option default changed to "Sandbox" (or maybe
that was just a new option with the new version). I think of this
possibility because the extra child forked off with that option exposed
some kernel memory-management problems on the arm platform a few months
ago.
That may imply that adding "UsePrivilegeSeparation no" could be a
workaround for anyone having severe problems with this on a production
server, but it should in no way become mythology that doing this somehow
"fixes" a problem -- it would be purely a workaround, and we should keep
pursuing the actual problem.
-- Ian
More information about the freebsd-hackers
mailing list