suggestions ?

Wed Jun 2 18:04:19 GMT 2004

Hello,

> 
> I see what you mean , you are talking at higher level ,
> when i mentioned Robust TCP/IP i meant TCP connections in the kernel
> network stack level ,
> the architecture you are talking about is like a middle ware handeling
> all TCP/IP connections for a client to multiple servers.
> 
> the mechanism is something like buffereing data in the network stack as
> prevention for eventual connection problem , when that problem happens
> and is detected , the Net. stack will try to reconnect ( while buffering
> the user data ) , once the connection is reistablished the buffered data
> will be sent and the user wont notice nothing ( if the outage time is
> not huge of course ).
> 
> that may sounds stupid , but that's what i'm thinking about.

The links that I sent earlier do exactly that. However there are two
basic problems :

1) Synchronization of the stream
	This problem arises if there is a reconnection, you do not know
where the stream was cut (disconnect) at both the client and the server.
If the client receives a byte of data, and ACKs it, the server discards
the buffer transparent to the application.

	In order to keep the stream consistent,  you need to copy every
byte sent on the connection and synchronize on reconnect. This kills
performance.

FT-TCP linked in the previous mail takes this approach.
There are others e.g. TESLA[4], ROCKS[5] etc. that provide user level
logging to create consistent socket streams.

	Another approach is to modify the server and keep a log in the
kernel (not discard the socket buffer until the application asks you
to). This requires modifications to the server but is low overhead,
since there is no extra copy and migration is lightweight.
Service Continuations[1], and M-TCP[3] take this approach.

You can find more details from http://discolab.rutgers.edu/sc.

2) Fault-tolerance :
	If by fault tolerance you mean a machine crash, then recovering
state is not possible by traditional means. We do that (in a separate
paper) by using a programmable NIC (Myrinet) to remotely read the memory
of the machine. This allows us to recover connections from a dead
machine (OS crash) [2] (http://discolab.rutgers.edu/bda).

	If you mean network level, then TCP provides that, unless you
want geographical separation, in which case, the client TCP must be
modified to route packets to the alternate route. This is again used
in Service Continuations and M-TCP[3].

	I hope this helps.
Cheers
Aniruddha

[1]Service Continuations: An Operating System Mechanism for Dynamic 
Migration of Internet Service Sessions.
F. Sultan, A. Bohra, L. Iftode.
http://discolab.rutgers.edu/sc/srds03.ps

[2]System Support for Nonintrusive Failure Detection and Recovery using 
Backdoors.
F. Sultan, A. Bohra, P. Gallard, I. Neamtiu, S. Smaldone, Y. Pan, and L. 
Iftode.
http://discolab.rutgers.edu/bda/remrecov04.pdf

[3] Migratory TCP: Highly Available Internet Services Using Connection 
Migration.
Florin Sultan, Kiran Srinivasan, Deepa Iyer, Liviu Iftode.
http://discolab.rutgers.edu/mtcp/dcs-tr-462.ps

[4] TESLA : http://nms.lcs.mit.edu/projects/migrate/

[5]ROCKS: http://www.cs.wisc.edu/~zandy/rocks/