improving transport over lossy links ?

Fri May 19 20:11:16 UTC 2006

On Fri, 19 May 2006 at 12:38:31 -0400, Mike Tancsa wrote:

 > At 12:06 PM 19/05/2006, Ian Smith wrote:
 > 
 > >Assuming that V.42 error correction is working properly - forced if need
 > >be - there shouldn't =be= any data loss, however slow getting through,
 > >this side of protocol timeouts of course.  I can't guess your mystery
 > >application, but often slower connections are better than dropped ones,
 > >or even ones that spend half their time trying to retrain at high rates.
 > 
 > Hi,
 >          Thanks for the reply.  Even at 28.8 I am seeing loss with 
 > the connection dropping and seeing dropped packets (e.g.
 > May 19 12:04:43 soekris4801 ppp[3404]: tun0: Phase: 1: HDLC errors -> 
 > FCS: 1, ADDR: 0, COMD: 0, PROTO: 0)

A lot of those, Mike, or just 1 FCS error per connection?  I'm using a
modem right now that often reports just 1 FCS upon linkup, then works
fine for the day.  Ah, 'often' includes this call I'm on now since:

 May 19 22:31:33 paqi ppp[2559]: tun0: Phase: deflink: HDLC errors ->
 FCS: 1, ADDR: 0, COMD: 0, PROTO: 0

.. so on its own that mightn't indicate much but maybe a chipset HDLC
bug, but bunches of these are most often seen (inbound) where a caller
negotiates a non-EC connection at 14400 or more; they rarely last long.

Even then, FCS errors indicate link-level packet retries, not drops. 
Line loss is more likely physical, poor signal to noise over too long a
time for desired line quality, which factors may well be tunable, at
either end.  I suppose the calling modems are a variety of types, and
you've really only any control over the inbound modem config?

 > Error correction is on and negotiated, from the terminal server's 
 > perspective at least and I imagine the modem too

A lot depends on the calling modems of course .. some do, some don't.

 > Testing here at the office
 > 
 >          Card Type: LU1674 Chipset
 >              State: ACTIVE
 >        Active Port: S26
 >      Transmit Rate: 28800
 >       Receive Rate: 26400
 >    Connection Type: LAPM/V42BIS
 >         Chars Sent: 215666023
 >     Chars Received: 58090941
 >           Retrains: 0
 >     Renegotiations: 4

Depends how long a call that represents I guess; it wouldn't look too
shabby here, but we average less than 1GB/mth with ~3-5 redials/mth.
Retrains 0, Rate renegs 4 over 215Mb doesn't look problematic.

 > The application is TCP based and monitors remote machinery. (And no, 
 > there is no chance at this point to re-write the application).  The 
 > transport is over a VPN (either IPSEC or OpenVPN) which ever deals 
 > better with the lossy connection.  However, many of the sites have 
 > dynamic IP addresses which makes it a pain to use with IPSEC and FreeBSD.

Well if you redial a lost connection (-ddial etc) and regain the same IP
address, TCP should chug on fine, ie remain lossless, while delayed.  If
not, open (link) TCP connections are shot, if that's the lossy you mean?

 > One think I observed with multi-link so far is that if I kill one of 
 > the connections, the modem does not tell ppp that it has lost carrier 
 > right away. Instead, I have to wait for the LCP echo timeout.  In the 
 > mean time, I get 50% packet loss for about 20 seconds.  However I can 
 > reduce that by setting the  lqrperiod to a lower value.  However, I 
 > dont want that too low, otherwise it spends all its time chewing up 
 > the link with LCP traffic.

Mmm, 50% loss sounds about right for half the link :)  Sorry, I've only
a vague idea how multilink PPP works, getting way out of my depth here. 

20 seconds sounds about right for successful redial / negotiation, but
why isn't modem !DCD telling ppp right away?  Is that a config issue? 

 > I was going to look at the one2many ng module to see if I can send 
 > out the same packets on both links at the same time as a sort of "as 
 > long as one packet gets there strategy"  Although the customer doesnt 
 > use wireless right now, we might have some sites that would need it 
 > in the fture and this might be an approach. I imagine satellite users 
 > run into this as well no ?

ng_one2many looks great, but there's still its Link Failure Detection to 
satisfy, which still has to come back to noticing modem !carrier, no?

(Satellite users run into everything at some stage; weather, sunspots .. 
we replaced one with ADSL last year.  ssh over sat was pretty tedious,
even with 128k ISDN back.  Outside wifi can get soggy in the wet, too!)

cheers, Ian