send() returns error even though data is sent, TCP connection still alive

Jeff Davis freebsd at j-davis.com
Wed Jan 31 19:39:48 UTC 2007


I am on FreeBSD 6.1 and I'm seeing write() return EHOSTDOWN while
keeping the connection alive.

I wrote a simple C client on the affected FreeBSD box to write a series
of integers to a server program on another machine. When the client's
write receives an the EHOSTDOWN, the data it sent arrives on the server
program anyway. Moreover, when I write() again on the same socket, the
data goes through as if nothing ever happened without further errors.
The connection is not broken by the EHOSTDOWN, and the client never
knows the difference. In fact, if the application just ignores the error
from write() everything appears fine after that.

The simplest way to see the problem is with SSH. Machine A is a freebsd
box, and machine B is another box on the same switch.

(1) ssh from A to B
(2) see on A that "arp -a" shows the entry for B
(3) on A do "arp -d B"
(4) pull network cable
(5) type <return> to try to send data over the SSH session (of course
nothing will happen, the network cable is still out)
(6) after the network cable has been unplugged for about 8 seconds, plug
it back in
(7) type in the SSH session again

You should see something like "write failed: host is down" and the
session will terminate. Of course, when ssh exits, the TCP connection
closes. The only way to see that it's still open and active is by
writing (or using) an application that ignores EHOSTDOWN errors from
write(). I think some scripting languages do not generate an exception
in that case.

This is very strange behavior and it's causing all kinds of problems on
our network. Does anyone have an explanation for this? Why would a TCP
operation return an error without closing the connection and send the
data anyway? This has existed for a long time.

I believe this is related to:
http://www.freebsd.org/cgi/query-pr.cgi?pr=100172
which is related to:
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/if_ether.c?
only_with_tag=RELENG_6#rev1.137.2.5

I tried the patch here:
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/if_ether.c?
f=h#rev1.158
(rev 1.158)

but I can still generate the error I mentioned.

Also, what's even more strange is that I set arp to be static on the
production machine, and I am still getting EHOSTDOWNs. 

Regards,
        Jeff Davis



More information about the freebsd-stable mailing list