sendfile(2) SF_NOPUSH flag proposal
Igor Sysoev
is at rambler-co.ru
Tue May 27 10:46:57 PDT 2003
On Tue, 27 May 2003, Terry Lambert wrote:
> Igor Sysoev wrote:
> > I mean that if you have 230 bytes header then sendfile() will send it
> > in separate packet nevertheless the size of header and of the file.
> > Something like this - 230, 1460, 1460, ...
>
> Again, see other post: this is arguably a sendfile(2) bug,
> though a reall minor one; one which should be addressed in
> the sendfile(2) implementation, and doesn't need options
> added to the API in order to address it.
How do suppose to coelesce the file pages ? Wire two or more pages
to mbuf's at once ?
BTW I did not see how sendfile() work over jumbo ethernet. I suspect
that without TCP_NOPUSH it sometimes sends 4096 or 8192 bytes packets
instead of 9000.
> > > > it will return me 230 bytes:
> > >
> > > The "HEAD" is atypical, compared to the "GET"; the full Google
> > > front page is larger than that, and consists of multiple files;
> > > assuming you support HTTP/1.1 and pipelining, it's going to be
> > > a back-to-back transfer involving multiple sendfile() calls.
> >
> > I use HEAD to show you the size of the HTTP header.
> > The HEAD is atypical but such small HTTP header is typical.
>
> Here is my problem: you are arguing both amortized cost and
> total cost, depending on which is more supportive of your
> main thesis. These arguments are seperate and orthogonal to
> each other: they don't support each other. You can argue
> tiny files, and a relatively high total cost, or you can argue
> large files and pipelining, and a relatively high amortized
> cost, but you can't argue both time and large files and
> many connections and one connection at the same time.
Terry, I do not understand you.
My argument is simple - I want to avoid the partial packets because it
decreases the number of packets. That's all. There's nothing about
amortized cost or total cost. I do not even know what they are.
> Personally, I'd step back and get the arguments straight,
> and get an implementation that demonstrates statistically
> significant performance differences, and then come back, if
> I wanted to press the case for additional option flags. I
> have done this several times in the past, e.g. with my soft
> interrupt coelescing implementation that's now part of most
> of the ethernet drivers people care about.
>
> Actually, in this case, I'd just try to fix sendfile(2) to
> do the packet coelescing I'd expect, given the relative
> state of the TCP_NODELAY and TCP_NOPUSH options flags.
Actually, sendfile() already works according to TCP_NOPUSH flag.
I do not know about TCP_NODELAY - I do not work with it.
But if you turn TCP_NOPUSH on then sendfile() will send the full packets.
If you turn TCP_NOPUSH off then sendfile() will send some packets partially
filled. It's correct.
> BTW: I'm still wary of the initial fault on the file data, if
> it's not already in cache: arguably, it's better to start
> sending the headers, and avoid the startup latency of delaying
> sending the headers until the fault is satisfied: part of the
> thing that's going to be eating your PCI bandwidth is the
> disk I/O, and your disks are going to be the slowest data
> sources/sinks in the whole equation.
I agree but after all it's 20ms or so delay.
> In any case, I expect that this should be handled in the
> context of TCP_NODELAY and TCP_NOPUSH, rather than by adding
> options to work around an arguably broken sendfile(2).
sendfile() already works nice with TCP_NOPUSH. I propose only the flags
that allow to turn TCP_NOPUSH (actually TF_NOPUSH) on/off inside sendfile().
Then in one syscall you can turn TCP_NOPUSH on, send the HTTP header, the file
pages and turn TCP_NOPUSH off if all file pages are wired to mbuf's.
And this TCP_NOPUSH state is not bound by sendfile() internals, you
can control it via setsockopt/getsockopt(TCP_NOPUSH).
Igor Sysoev
http://sysoev.ru/en/
More information about the freebsd-arch
mailing list