Seeing EINVAL from writev on 8.0 to a non-blocking socket even though the data seems to hit the wire

Gleb Smirnoff glebius at FreeBSD.org
Mon May 6 08:13:03 UTC 2013


  [I'm adding Jack Vogel, maintainer of ixgbe, to cc]

On Fri, May 03, 2013 at 07:01:18PM -0700, Richard Sharpe wrote:
R> On Fri, May 3, 2013 at 10:18 AM, Richard Sharpe
R> <realrichardsharpe at gmail.com> wrote:
R> > On Fri, May 3, 2013 at 7:39 AM, Eric van Gyzen <eric at vangyzen.net> wrote:
R> >> On 05/02/2013 19:00, Richard Sharpe wrote:
R> >>> On Thu, May 2, 2013 at 7:52 AM, Eric van Gyzen <eric at vangyzen.net> wrote:
R> >>>> On 05/02/2013 08:48, Richard Sharpe wrote:
R> >>>>> On Wed, May 1, 2013 at 9:34 PM, Alfred Perlstein <bright at mu.org> wrote:
R> >>>>>> On 5/1/13 8:03 PM, Richard Sharpe wrote:
R> >>>>>>> Hi folks,
R> >>>>>>>
R> >>>>>>> I am checking to see if there are any known bugs with respect to this
R> >>>>>>> in FreeBSD 8.0.
R> >>>>>>>
R> >>>>>>> Situation is that Samba 3.6.6 uses writev to a non-blocking socket to
R> >>>>>>> get the SMB2 requests on the wire.
R> >>>>>>>
R> >>>>>>> Intermittently, we see the writev return EINVAL even though the data
R> >>>>>>> has gotten on the wire. This I have verified by grabbing a capture and
R> >>>>>>> comparing the SMB Sequence number in the last outgoing packet on the
R> >>>>>>> wire vs the in-memory contents when we get EINVAL.
R> >>>>>>>
R> >>>>>>> Sometimes it occurs on a four-element IOVEC, sometimes we get EAGAIN
R> >>>>>>> on the four-element IOVEC and then we get EINVAL when retrying on a
R> >>>>>>> smaller IOVEC.
R> >>>>>>>
R> >>>>>>> Where should I look to check if there is some path where this might be
R> >>>>>>> happening? Is this even the correct mailing list?
R> >>>>>>>
R> >>>>>> What does the iovec look like when you get EINVAL? Can you sanity check
R> >>>>>> it? Is there anything special about it? (zero length vecs?)
R> >>>>>>
R> >>>>>> I think there are a few "maxvals" that if overrun cause EINVAL to be
R> >>>>>> returned. example is if your iovec is somehow huge or has many, many
R> >>>>>> elements.
R> >>>>> Can anyone tell me the call graph down to the TCP code?
R> >>>>>
R> >>>> writev kern/sys_generic.c
R> >>>> kern_writev
R> >>>> dofilewrite
R> >>>> fo_write in sys/file.h
R> >>>> soo_write in kern/sys_socket.c
R> >>>> sosend in kern/uipc_socket.c
R> >>>> sosend_generic
R> >>>> tcp_usr_send in netinet/tcp_usrreq.c
R> >>> Is there a tool that generates call graphs?
R> >>
R> >> I'm not aware of one that works in the kernel--other than the kernel
R> >> itself, of course. With DDB compiled in, you could set a breakpoint on,
R> >> say, tcp_output, and show the call stack with bt.
R> >>
R> >> Also, take a look at stack(9).
R> >>
R> >>> I have been able to demonstrate that I am getting EINVAL returned from
R> >>>  writev kern/sys_generic.c, kern_writev, dofilewrite and soo_write,
R> >>> but when I add printfs to sosend/sosend_generic it becomes very hard
R> >>> to provoke this problem.
R> >>
R> >> So, either relocating code or changing the timing has changed the
R> >> behavior--a Heisenbug.
R> >>
R> >> If your code looks like this:
R> >>
R> >> if (error == EINVAL)
R> >> printf("you are here\n");
R> >>
R> >> You might add __predict_false, like this:
R> >>
R> >> if (__predict_false(error == EINVAL))
R> >> printf("you are here\n");
R> >>
R> >> That /might/ reduce the impact on runtime behavior.
R> >
R> > Thanks for that. The problem does not appear to be in the TCP or IP
R> > layers. Rather, it appears to be in the ixgbe driver.
R> >
R> > The problem takes a little more effort to provoke, but simple printfs
R> > are doing the job so far.
R> 
R> The version of the ixgbe driver we are using seems to set the max size
R> of a dma element to 65535 (IXGBE_TSO_SIZE)  and, even though large
R> numbers of iovecs are sent where the last element is 65536 bytes in
R> size, sometimes this causes EINVAL to be returned ...

Jack, can you look at this please?

-- 
Totus tuus, Glebius.


More information about the freebsd-net mailing list