sshd with zombie process on FreeBSD 10.0-STABLE - workaround
Marcelo Gondim
gondim at bsdinfo.com.br
Sat Mar 22 11:54:30 UTC 2014
Em 22/03/14 04:18, Kevin Oberman escreveu:
> On Fri, Mar 21, 2014 at 11:06 PM, Marcelo Gondim
> <gondim at bsdinfo.com.br <mailto:gondim at bsdinfo.com.br>> wrote:
>
> Em 22/03/14 02:02, Kevin Oberman escreveu:
>
> On Thu, Mar 20, 2014 at 4:46 PM, Marcelo Gondim
> <gondim at bsdinfo.com.br <mailto:gondim at bsdinfo.com.br>>wrote:
>
> Em 20/03/14 11:58, John Baldwin escreveu:
>
> On Wednesday, March 19, 2014 1:47:10 pm Marcelo Gondim
> wrote:
>
> Em 19/03/14 13:01, Kevin Oberman escreveu:
>
> On Wed, Mar 19, 2014 at 6:00 AM, Marcelo Gondim
>
> <gondim at bsdinfo.com.br
> <mailto:gondim at bsdinfo.com.br>>wrote:
> Hi all,
>
> While the solution does not appear, did
> the script below and put it in
> crontab to automatically delete zombie
> sshd processes.
>
> the_walking_dead.sh:
>
> #!/bin/sh
> kill -9 `ps afx|grep sshd|grep unknown|awk
> '{print $1}'`
>
>
> Put this in /etc/crontab:
>
> 00 1 * * * root the_walking_dead.sh
>
>
> If 'kill -9' works, the process is not
> really a zombie. It simply
>
> still
>
> has
> a socket open and is waiting for it to be closed
> before exiting.
>
> You might takes a look at network sockets with
> sockstat(1) and see if
> you
> can get any indication of why these sockets
> are not being closed. It may
>
> be
> that the issue is not sshd but some other issue in
> the OS leaving sockets
>
> open.
>
> Hi Kevin,
>
> My ps -afx below:
>
> [...]
> 42139 - Is 0:00.01 sshd: unknown [priv] (sshd)
> 42140 - Z 0:00.01 <defunct>
> 42141 - IW 0:00.00 sshd: unknown [pam] (sshd)
> 58445 - Is 0:00.01 sshd: unknown [priv] (sshd)
> 58446 - Z 0:00.02 <defunct>
> 58447 - IW 0:00.00 sshd: unknown [pam] (sshd)
> 65635 - Is 0:00.01 sshd: vinicius [priv]
> (sshd)
> 65636 - Z 0:00.01 <defunct>
> [...]
>
> # sockstat | grep 42140
> #
>
> # sockstat | grep 58446
> #
>
> # sockstat | grep 65636
> #
>
> No associated socket with zombie process.
>
> Do a pstree. I bet the zombies are children of the
> other processes that
> are stuck on a socket as Kevin described.
>
> # ps afx|grep sshd |grep unk
>
> 10948 - Is 0:00.02 sshd: unknown [priv] (sshd)
> 10955 - IW 0:00.00 sshd: unknown [pam] (sshd)
> <====
> 11701 - Is 0:00.02 sshd: unknown [priv] (sshd)
> 11704 - IW 0:00.00 sshd: unknown [pam] (sshd)
> 25450 - Is 0:00.01 sshd: unknown [priv] (sshd)
> 25452 - IW 0:00.00 sshd: unknown [pam] (sshd)
> 41193 - Is 0:00.02 sshd: unknown [priv] (sshd)
> 41196 - IW 0:00.00 sshd: unknown [pam] (sshd)
> 42193 - Is 0:00.02 sshd: unknown [priv] (sshd)
> 42195 - IW 0:00.00 sshd: unknown [pam] (sshd)
> 80638 - Is 0:00.02 sshd: unknown [priv] (sshd)
> 80640 - IW 0:00.00 sshd: unknown [pam] (sshd)
> 81484 - Is 0:00.02 sshd: unknown [priv] (sshd)
> 81486 - IW 0:00.00 sshd: unknown [pam] (sshd)
>
> With proctstat I could see the socket as follows:
>
> # procstat -f 10955
> PID COMM FD T V FLAGS REF OFFSET PRO
> NAME
> 10955 sshd text v r r------- - - -
> /usr/sbin/sshd
> 10955 sshd cwd v d r------- - - - /
> 10955 sshd root v d r------- - - - /
> 10955 sshd 0 v c rw------ 6 0 -
> /dev/null
> 10955 sshd 1 v c rw------ 6 0 -
> /dev/null
> 10955 sshd 2 v c rw------ 6 0 -
> /dev/null
> 10955 sshd 3 s - rw---n-- 2 0 TCP
> 186.xxx.xx.2:22
> 186.xxx.xx.8:57035
> 10955 sshd 5 p - rw------ 2 0 - -
> 10955 sshd 6 s - rw------ 2 0 UDS -
> 10955 sshd 7 p - rw------ 1 0 - -
> 10955 sshd 8 s - rw------ 2 0 UDS -
>
> I do not understand why these connections are remaining
> locked in FreeBSD
> 10.0
>
> I'll try this sysctl: net.inet.tcp.delayed_ack=0
>
> If the problem is still showing up, can you see what is going
> on with the
> socket? What is the state of the connection. Try "netstat -f
> inet -p tcp"
> and see what state the connection is in. I'm wondering if
> there is some
> sort of race going on where the socket hangs.
>
> Ideally I'd look to try and capture the packets st the end of
> the session.
> Can you do something to trigger this reliably? if so
> "standard" "tcpdump
> -pw file.bpf host HOST". I seem to recall that these
> connections are
> scheduled. If so, you can put the packet capture in a crontab
> to run at the
> same time. If you feed this to a tool like wireshark, you
> should get a good
> idea of what is happening, if not why. I understand that the
> timing of this
> might be very tricky.
>
> Hi Kevin,
>
> Thanks for your help.
>
> I did the netstat and the state of the connection is closed as you
> can see below:
>
> # procstat -f 26177
> PID COMM FD T V FLAGS REF OFFSET PRO NAME
> 26177 sshd text v r r------- - - - /usr/sbin/sshd
> 26177 sshd cwd v d r------- - - - /
> 26177 sshd root v d r------- - - - /
> 26177 sshd 0 v c rw------ 6 0 - /dev/null
> 26177 sshd 1 v c rw------ 6 0 - /dev/null
> 26177 sshd 2 v c rw------ 6 0 - /dev/null
> 26177 sshd 3 s - rw---n-- 2 0 TCP
> 186.193.48.10:4321 <http://186.193.48.10:4321> 186.193.48.8:50094
> <http://186.193.48.8:50094>
> 26177 sshd 4 s - rw------ 1 0 UDS -
> 26177 sshd 5 p - rw------ 2 0 - -
> 26177 sshd 6 s - rw------ 2 0 UDS -
>
> # procstat -f 10110
> PID COMM FD T V FLAGS REF OFFSET PRO NAME
> 10110 sshd text v r r------- - - - /usr/sbin/sshd
> 10110 sshd cwd v d r------- - - - /
> 10110 sshd root v d r------- - - - /
> 10110 sshd 0 v c rw------ 6 0 - /dev/null
> 10110 sshd 1 v c rw------ 6 0 - /dev/null
> 10110 sshd 2 v c rw------ 6 0 - /dev/null
> 10110 sshd 3 s - rw---n-- 2 0 TCP
> 186.193.48.10:4321 <http://186.193.48.10:4321> 186.193.48.8:63048
> <http://186.193.48.8:63048>
> 10110 sshd 4 s - rw------ 1 0 UDS -
> 10110 sshd 5 p - rw------ 2 0 - -
> 10110 sshd 6 s - rw------ 2 0 UDS -
>
> # netstat -f inet -p tcp
> Active Internet connections
> Proto Recv-Q Send-Q Local Address Foreign Address (state)
> tcp4 0 0 bart.24173 pppoe17250.8728 ESTABLISHED
> tcp4 0 0 bart.53795 pppoe17249.8728 TIME_WAIT
> tcp4 0 0 bart.54191 pppoe149.8728 TIME_WAIT
> tcp4 0 0 bart.12476 pppoe148.8728 TIME_WAIT
> tcp4 0 0 bart.36846 pppoe142.8728 TIME_WAIT
> tcp4 0 0 bart.39944 186.193.48.22.8728 TIME_WAIT
> tcp4 0 0 bart.60233 186.193.48.25.8728 TIME_WAIT
> tcp4 0 0 bart.50946 186.193.48.9.8728 TIME_WAIT
> tcp4 0 0 bart.13403 186.193.48.19.8728 TIME_WAIT
> tcp4 0 0 bart.36982 zeus.linuxinfo.c.8728 TIME_WAIT
> tcp4 0 0 bart.rwhois pppoe769.49896 ESTABLISHED
> tcp4 0 0 bart.mysql mail.15711 ESTABLISHED
> tcp4 0 0 bart.mysql mail.16087 ESTABLISHED
> tcp4 0 0 bart.mysql mail.25051 ESTABLISHED
> tcp4 0 0 bart.mysql mail.59126 ESTABLISHED
> tcp4 0 0 bart.mysql mail.59051 ESTABLISHED
> tcp4 0 0 bart.mysql mail.29446 ESTABLISHED
> tcp4 0 0 bart.mysql mail.45453 ESTABLISHED
> tcp4 0 0 bart.mysql mail.14938 ESTABLISHED
> tcp4 0 0 bart.mysql mail.46230 FIN_WAIT_2
> tcp4 0 0 bart.mysql mail.16930 FIN_WAIT_2
> tcp4 0 0 bart.mysql mail.28074 FIN_WAIT_2
> tcp4 0 0 bart.mysql mail.53686 FIN_WAIT_2
> tcp4 0 0 bart.mysql mail.14448 FIN_WAIT_2
> tcp4 0 0 bart.mysql mail.52487 ESTABLISHED
> tcp4 0 0 bart.rwhois 186.193.48.8.50094 CLOSED
> <====
> tcp4 0 0 bart.mysql mail.38286 FIN_WAIT_2
> tcp4 0 0 bart.mysql mail.32387 FIN_WAIT_2
> tcp4 0 0 bart.mysql mail.52219 ESTABLISHED
> tcp4 0 0 bart.mysql mail.52144 ESTABLISHED
> tcp4 0 0 bart.mysql mail.18862 FIN_WAIT_2
> tcp4 0 0 bart.mysql mail.52636 FIN_WAIT_2
> tcp4 0 0 bart.mysql mail.51607 FIN_WAIT_2
> tcp4 0 0 bart.mysql mail.62581 FIN_WAIT_2
> tcp4 0 0 bart.mysql mail.23071 ESTABLISHED
> tcp4 0 0 bart.mysql mail.22862 FIN_WAIT_2
> tcp4 0 0 bart.rwhois 186.193.48.8.63048 CLOSED
> <====
> tcp4 0 0 bart.mysql mail.42479 FIN_WAIT_2
> tcp4 0 0 bart.mysql mail.18146 ESTABLISHED
> tcp4 0 0 bart.mysql mail.46731 FIN_WAIT_2
> tcp4 0 0 bart.mysql mail.20498 ESTABLISHED
> tcp4 0 0 bart.62869 186.193.48.2.1190 ESTABLISHED
> tcp4 0 0 bart.mysql mail.55353 ESTABLISHED
>
>
> I'm sorry. I am now even more confused. Maybe I need to re-read the
> entire thread.
>
> I thought that the hung processes were sshd. These are rwhois. Or is
> there an ssh tunnel carrying the rwhois connections? (I see no sshd
> connections in this list.)
> --
> R. Kevin Oberman, Network Engineer, Retired
> E-mail: rkoberman at gmail.com <mailto:rkoberman at gmail.com>
Hi Kevin,
Nope, I use 4321/tcp port for sshd and not port 22/tcp. When I ran the
netstat did not put the -nparameter and then it changed 4321to rwhois.
# cat /etc/services |grep rwhois
rwhois 4321/tcp #Remote Who Is
rwhois 4321/udp #Remote Who Is
More information about the freebsd-stable
mailing list