SCTP using questions (API etc.)

Wed Apr 16 14:26:29 UTC 2008

Vadim:

Sorry I have not chimed in earlier.. I tend to
"not look" often at some of my boxes :-)

Glad Michael helped out here .. thanks Michael

Vadim Goncharov wrote:
> Hi Michael Tuexen! 
> 
> On Thu, 6 Mar 2008 09:34:13 +0100; Michael Tuexen wrote about 'Re: SCTP using questions (API etc.)':
> 
>>>>> "substreams". SCTP can do it for me, it's wonderful, but in practice
>>>>> there
>>>>> are some questions.
>>>>>
>>>>> How long can be one particular SCTP message? Can I rely on the fact
>>>>> that it
>>>>> can be unbounded, e.g. I want to emulate a stream with transfer of
>>>>> 6 Gig-sized file?
>>>> Protocol wise there is no limitation of the message size. API wise,  
>>>> for
>>>> this size of a message you need to use the explicit EOR mode to be  
>>>> able
>>>> to pass this large message using multiple sequential send() calls.
>>> And how should I determine from my/remote stack an optimal size for  
>>> message
>>> parts when entire message is guaranteed to not fit into buffers/ 
>>> windows of
>>> both peers?
>> If the sendbuffer is too small for the message to fit, the send call
>> will return -1 and errno being set to EMSGSIZE. Or you do it in the  
>> application
>> by inspecting the sendbuffer size. You do not have to deal with the  
>> recv buffer
>> of the peer.
> 
> So this means I need no subscription to unsent messages and simply can try
> to resend message in several steps without EOR, after getting EMSGSIZE ?

So, if you have put your socket int EEOR mode, then
you could send multiple sends down the socket (to the same stream)
until you get back a EWOULDBLOCK. You would only get a
EMSGSIZE if the value you are sending is larger than the entire
size of the send socket buffer..

So lets say the send buffer for the socket is 100k

You could do
while(ok) {
    send(1k[index]);
    if(ret == -1) && error == EWOULDBLOCK)
        hit full buffer (100k is inqueue)
        go do wait or other thing
        resume send(1k[index]
    else
       index++;

Either all of the buffer or none of the buffer will be sent.

> 
>>>>> Can a message be of zero-length data (only headers) ?
>>>> Empty user messages, i.e. a DATA chunk without payload is not  
>>>> allowed.
>>>> An empty SCTP message, i.e. only the common header without any chunks
>>>> is allowed and processed by FreeBSD when received, but ever send  
>>>> (well,
>>>> I do not know a way to force the FreeBSD implementation to send it).
>>> OK, understood. So I should include at least 1 byte of my own  
>>> headers into
>>> data and do receive into *iov with at least to parts to ensure good  
>>> align
>>> for non-header part?
>> What header are you talking about? An application header or any SCTP  
>> header?
>> You will never receive any SCTP header as part of a user message via
>> a recv() call. SCTP will give you as much of a message that fits into
>> the buffer you provide or it has, if the partial delivery API has been  
>> invoked.
> 
> My applicaion-protocol header, of course. Does this mean also that I should
> always enable partial delivery on receiving? Or what will happen if received
> msg is too big and don't fir into my buffers?

Well, you have no control over this per.se. You can get partial delivery
events.. there is only one. the partial delivery was aborted.. you probably
need this if you are going to do EEOR mode.

Basically the kernel will start a partial delivery when 1/2 of the recv 
buffer
is in use. Note there is a socket option to control this value, so you can
change it if you like...

> 
>>>>> What is the relation between SCTP streams in both directions? Can
>>>>> streams
>>>>> be opened and closed on-demand, like SSH port forwarding (yet again
>>>>> multiplexing example) or they are preallocated at connection setup  
>>>>> all
>>>>> together? What is the minimum number of streams application can rely
>>>>> upon
>>>>> (or it just one stream 0 in general case) ?
>>>> If you restrict to protocols being in RFC status, there is no way of
>>>> modifying the number of streams during the lifetime of an  
>>>> association.
>>>> The number of streams in each direction is negotiated during the
>>>> association setup. The streams in bother directions are completely
>>>> independent. There is always at least one stream in each direction,
>>>> which
>>>> is stream 0.
>>>> However, there is an extension (currently specified in an Internet
>>>> Draft)
>>>> which allows the addition of streams during the lifetime of an
>>>> association.
>>>> The ID is at least partially supported by the FreeBSD implementation.
>>>> https://datatracker.ietf.org/drafts/draft-stewart-sctpstrrst/
>>> OK. Are there recommended defaults for various stacks about how many
>>> streams they are creating by default / what maximum of them  
>>> application
>>> can ever request?
>> The maximum number to request is 2^16 - 1. It is controllable by the
>> applications via socket options. Defaults in OSes are in the order of
>> 10, 16, 32...
> 
> Can I be sure that every OS can give me maximum number of streams if I
> request it?

The ceiling for the number of streams is actually a defined contnsatn
in the BSD stack (and in most).

For bsd its defined in
sctp_constants.h

and is

#define MAX_SCTP_STREAMS 2048

I probably should make this a configurable item ... hmm..

Each stream outbound costs about 16 bytes.
Each stream inbound costs about 16 bytes...

Thus my desire to limit to some extent resources used.. I think
most kernels do this as well.

You of course can twiddle the define, and I think for 8.x I will see about
making it an option.

> 
>>>>> How can I put request to kernel for a connect, for example, and then
>>>>> sleep
>>>>> until connect will complete or event in some another descriptor will
>>>>> occur?
>>>> If you use the 1-to-1 style API, it should be similar to using TCP.
>>>> Put the socket in non-blocking mode, enable notifications,
>>>> call connect() and wait until the fd becomes readable. You should  
>>>> get an
>>>> indication that that association has been established or could not
>>>> start.
>>> Yes, that's possible, as I see after reading draft-ietf-tsvwg- 
>>> sctpsocket.
>>> But several more questions arise. What notifications do I really need
>>> on 1-to-1 non-blocking socket API mode? What use is 'context' in this
>>> 1-to-1 context and why after a failed send I must receive entire  
>>> failed
>>> sent message (which can be very long) instead of just an error code?
>> The context is something you provide in the send call and is given
>> back to you. So you can use it to find some state/buffer/whatever again.
> 
> It was unclear from draft whether context is one per SCTP association or per
> send call. And what the hell are all that unsent messages, why I must
> retrieve entire unsent message - can I fire-and-forget a 2M msg and receive
> only context of it instead of all 2 megs? And on which condition such event
> can ever occur - with TCP it's simple, I either do write() a number of bytes
> successfully or receive an error from write() - be that EAGAIN for just
> blocking of peer's recv() or connection termination error. What concept is
> under unsent msgs?

The idea is that you can see the message that did not get sent. And
you can know if it was every sent .. i.e. put on the wire but
unack'd or never put on the wire.

We don't currently have a way to not get the entire message up (sorry
no one ever asked for that)...

The context is kept per message if I remember right.. Its copied from
the sinfo_context field and then carried with the queued data
until its acked and freed.

I believe you can set a default context as well..

> 
>>> In usual FSM I can use kqueue()/kevent() with arbitrary void* to my
>>> data, also telling me how many bytes I can read from or write to
>>> the socket (RCVLOWAT etc.), as well as indicating error/EOF conditions
>>> so I don't need to do additional syscalls. Is this working with SCTP?
>> Haven't tried it... Sounds like it would make sense to make sure that
>> it works.
> 
> Oh, can you please check it?.. Would be good to support all features
> described in kqueue(2).

I rather doubt this works, since we don't use socket buffers.. pe.se.

I will have to go take a look at it and will proabably need to add
that to my TODO list.

Michael just finished getting it to work INET only.. (no v6).. good work
Michael :-D

> 
>>> If I can't write to TCP socket (due to window shortage from peer),
>>> I leave data in my own application buffers, but SCTP tells something
>>> about unsent messages delivered later, looks somewhat weird, do I  
>>> really
>>> need this? Also, all that msg*/cmsg* staff is too complex, and I see
>>> there are simplier sctp_send()/sctp_sendx() interfaces, will they be
>>> enough and really simplier for me?..
>> sctp_sendx() purpose is to use the multiple addresses provided during
>> the implicit setup of the association. So I think you are not looking  
>> for
> 
> Ok.
> 
>> this. sctp_send() can be used to provide the stream id, payload protocol
>> identifier and to on with using the CMSG stuff. So you might be looking
>> for this function.
> 
> With CMSG? May be you wanted to say 'without' ?..

Yep,

The sctp_xxx send calls are true function calls so they do not
have the intense overhead of the app encoding ancillary data
and the kernel un-encoding it.. much better :-)

> 
>>>>> How can I put each client to it's fd and then do a kqueue()/kevent()
>>>>> on a
>>>>> set of those fd's (and other items) ? It is very handy to have this
>>>>> architecture as kevent() allows to store an arbitrary void* in it's
>>>>> structure which I can later use to quickly dispatch events.
>>>>>
>>>>> And, of course, all this usual C10K-problem-solving-TCP-server
>>>>> tricks I want
>>>>> with basic SCTP SEQPACKET benefits: multiple streams and message
>>>>> record
>>>>> separation (I don't need other SCTP features currently). Where can I
>>>>> find
>>>>> answers to these questions, like it was with W.R.Stevens books for
>>>>> TCP ?..
>>>> Have you looked at the third edition of 'Unix Network Programming'?
>>>> Randall Stewart wrote a couple of sections covering SCTP...
>>> Unfortunately, I have only 2nd edition currently available here,  
>>> though
>>> heard about 3rd, yes. May be several months later...
>> It is really worth buying if you are interested in SCTP socket  
>> programming...
> 
> I know, but in my province it is currently unavailable for some months...
> you know, Siberia, bears walking on the streets :) but it is not critical
> for actual SCTP programming (TCP version will be debugged first), but I need
> to take architectural decisions now.
> 
> Also, are there some examples of real-world SCTP applications with source
> code available? May be something is getting to integrate into our base
> system?..
I could probably find some of my test code and send it to you..
I have a pretty intesensive test app that we use sctp_test_app that
does about every socket option etc.. its not pretty..(it grew organically)..
but it does cover lots of stuff..

R

> 

-- 
Randall Stewart
NSSTG - Cisco Systems Inc.
803-345-0369 <or> 803-317-4952 (cell)