cvs commit: src/sys/netinet tcp_syncache.c

Andre Oppermann andre at freebsd.org
Mon Jan 28 01:46:32 PST 2008


Maxim Konovalov wrote:
>>> http://maxim.int.ru/stuff/adsl.dmp.gz
>>>
>>> 192.168.1.1 -- adsl box
>>> 192.168.1.250 -- FreeBSD
>> The trace looks perfectly fine with regard to timestamps.  They are
>> sent and properly reflected by the adsl box.  Everything else looks
>> fine too. No anomalies seen.
>>
>> The syncache check for timestamps wouldn't be triggered anyway
>> because it only applies to incoming connections.  Not segments in
>> general. Connections initiated by the FreeBSD box never go through
>> syncache.
>>
>> To track down the problem you saw back then, which is very probably
>> unrelated to the syncache issue, I would need a trace of a failed
>> connection.  For that you need to downgrade.  If you can find time
>> for the downgrade I'm happy to find the root cause.
>>
> I find a kernel from September sitting in /boot.  There are two new
> dumps now:
> 
> http://maxim.int.ru/stuff/adsl.failed.dmp.gz
> 
> The adsl router displays login page but never returns the second page.

Timestamps are fine.  The problem seems to be related to the window scale
option.  The adsl router advertises support of the window scale option
but doesn't make use of it (wcale=1) itself.  FreeBSD sends a wscale of
8 (multiply by 256).  Two things seem to go wrong on the adsl router.
First it *seems* to divide the value in the tcp headers window size field
by 256 instead of multiplying it (could be a byte order issue).  That's
why it stalls.  It thinks the socket buffer on FreeBSD is full and
has insufficient space for the next segment it wants to send.
Second it then *seems* to try to do window probes (which are correctly
answered by FreeBSD).  The window probes aren't correct either as they
do not contain the one byte payload that should accompany them.  The
sequence number of the pseudo window probe is one below snd_nxt on top
of it too (a retransmit of an already ACK'd byte).

The TCP implementation of the adsl router (Asustek by the MAC address)
looks like it is really broken and incomplete in multiple ways.  I'd say
its socket buffers work segment based and it can't split up an already
created segment when the target window is too small.

The fault really lies at the adsl router choking on the larger window
scale.  It'll fall over with Windows Vista too which also started using
larger wscale values.  It would have been better if the router didn't
even advertise wscale in its SYN as it doesn't use it itself and implements
it completely wrong.

Newer FreeBSD kernels work again because Silby changed the way our wscale
in computed.  Previously it was scaled as high as possible while retaining
the smallest allowable MSS as smallest granularity.  Now it is scaled as
high as necessary to fit the largest allowed socket buffer as defined by
kern.ipc.maxsockbuf.  The scale factor is now 3 (multiply by 8) and doesn't
through the adsl modem as far off as 8 does.

Who's fault is it?  Clearly the adsl modem.  It's tcp is utterly broken.
Should FreeBSD work around it?  In this case I don't think so.  Normally
yes if it is an edge case in a specification or some generally made mistake.
This is not the case with the adsl modem.  It's really broken and in complete
disregard of even the basic standards.  The vendor should fix it, not us
work around it.

> http://maxim.int.ru/stuff/adsl.rfc1323=0.dmp.gz
> 
> This one works.

-- 
Andre


More information about the freebsd-net mailing list