sendfile(2) SF_NOPUSH flag proposal
Igor Sysoev
is at rambler-co.ru
Tue May 27 11:36:28 PDT 2003
On Tue, 27 May 2003, Terry Lambert wrote:
> But this call should not be necessary. Internally, the
> sendfile(2) implementation should treat the headers +
> file contents + trailers as a single stream. Your problem
> is that the implementation of sendfile(2) sucks and is not
> doing this, not that you need to set TCP_NOPUSH to avoid
> seperation of three back-to-back transmits: you don't *have*
> three back-to-back transmits here, you have only *one*
> transmit.
>
> Would you expect a writev(2) operation to break up each of
> the chunks described by the vector into seperate back-to-back
> transmits? If not, why do you expect sendfile(2) to do it?
Yes, I agree that sendfile() should work as writev().
> > The turing TF_NOPUSH off has almost the same overhead as
> > setsockopt(TCP_NOPUSH, 0) if you need to call tcp_output(tp) inside
> > sendfile(2) and has no overhead at all if you do not need to call it.
>
> The problem is that you need to break tcp_output() into a
> couple of routines, OR you need to not call it on the
> headers, file data, and trailers seperately.
No, tcp_output() is called only once and only if the data in the send buffer
is less than MSS:
sendfile()
{
if (flags & SF_NOPUSH) {
tp->t_flags |= TF_NOPUSH;
}
writev(header);
send file pages;
writev(trailer);
if (error == 0 && flags & SF_PUSH) {
tp->t_flags &= ~TF_NOPUSH;
if (so->so_snd.sb_cc < tp->t_maxseg) {
error = tcp_output(tp);
}
}
}
> I think we can all make up our own stories, where the overhead
> could become important enough for a specific application that
> we wouldn't complain about you eliminating it so you could do
> your application, as long as it doesn't negatively impact the
> rest of us (say, by adding non-standard sendfile(2) flags that
> no one else supports, if that isn't the only possible way to
> solve the problem).
sendfile(2) is completly non-standard thing. Among FreeBSD, Linux, Solaris,
HP/UX and AIX no one has even similar prototypes. And all of them have
different functionality.
> I don't think overhead is the issue, at this point: say we agree
> with you on overhead, for your particular application, and we are
> not against you solving your overhead problem: why exactly does
> the API have to change to fix the root cause of the problem?
I do not propose the change of the API, I propose the source and binary
compatible addition.
Igor Sysoev
http://sysoev.ru/en/
More information about the freebsd-arch
mailing list