kern/68011: [patch] Isochronous delays in PPPoE
Sergio de Souza Prallon
prallon at uol.com.br
Wed Jun 16 14:50:56 GMT 2004
>Number: 68011
>Category: kern
>Synopsis: [patch] Isochronous delays in PPPoE
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Wed Jun 16 14:50:26 GMT 2004
>Closed-Date:
>Last-Modified:
>Originator: Sergio de Souza Prallon
>Release: FreeBSD 4.10-STABLE i386
>Organization:
>Environment:
>Description:
I use clockspeed (ports/sysutils/clockspeed) to keep my clock
in sync. A couple of months ago I noticed it no longer was able
to get the time reliably. When run from the cmd line it produced
error msgs and sometimes failed to set the clock. Pinging the
NTP server, I saw the RTT was too high (~500-1000ms). Even more
anoying was the fact that most of the ICMP replies were taking
the same RTT (to a 1ms precision). Pinging other sites and
servers had the same results. The same for the PPPoE terminator.
TCP connections were normal except for a "lag" in interactive
SSH sessions to remote hosts. HTTP downloads were acceptable.
At first, I tought it was a problem with my access provider, but
they assured me everything was just fine on they side (no alarms,
no abnormal error rates, etc). Not that I really trust them but
I decided to investigate my side. My HW configuration haven't
changed in months before, so the problem had to be software
related. A week or two before, I had cvsup'ed and rebuilt my
system. To check this, I cvsup'ed angain, this time to 4.9-REL.
The problem vanished.
After making a diff 4.9-REL and 4.10-ST, I began a process to
try to pinpoint the change(s) that caused the problem. Eventually
I came to 3 diffs that were commited at the same time with the
same CVS comment:
----8<--------8<--------8<--------8<--------8<--------8<----
MFC:
speedup stream socket recv handling by tracking the tail of the
mbuf chain instead of walking the list for each append. This has
been pretty well tested at Yahoo!
Obtained from: netbsd (jason thorpe)
Reviewed by: silby
----8<--------8<--------8<--------8<--------8<--------8<----
I failed to understand how such change slow down (or synchronize)
my trafic. I don't see any time dependency (spin loops or sleeps)
in it, but it do trigger the problem.
To document it, I produced a screen (ports/misc/screen) session
where I show:
1) The problem occurring on an up to date system.
2) That a 4.9-REL does not have it.
3) That a patched 4.9-REL kernel have it (with both userlands).
The screenlog plus (possibly) relevant syslog and config info
(including the diff that cause the bug) are in an annex file.
I don't know if it affects other types of connections. I only
have ADSL here.
>How-To-Repeat:
Start with a 4.9-REL system. Apply the patch and make a new
kernel. It should exhibit the problem. Based on what it's
changed, I don't think it's platform specific but I just can't
prove it.
>Fix:
I'm currently running a 4.9-REL kernel with a 4.10-ST userland
just fine. I believe that undoing the change should fix(?) the
problem. I have not tested it, because the patch fail to reverse
due to other changes in the code after this one. Of course, the
correct solution is to understand what's going on and rewrite
the change.
>Release-Note:
>Audit-Trail:
>Unformatted:
>System:
FreeBSD ethshar 4.10-STABLE FreeBSD 4.10-STABLE #0:
Sun Jun 13 13:05:35 BRT 2004
root at ethshar:/aux/src/sys/compile/TEST i386
Machine is a Intel Seattle II (SE440BX-2) + PIII 600E
+ 256MB RAM + 20GB HD.
The Internet connection is ADSL (256Kbps).
It uses a VIA Rhyne III ethernet + USR 9001 ADSL modem.
I don't known the brand of the DSLAM but the tunnel terminator
is probably a Cisco 6400.
More information about the freebsd-bugs
mailing list