Seeing EINVAL from writev on 8.0 to a non-blocking socket even though the data seems to hit the wire

Richard Sharpe realrichardsharpe at gmail.com
Sat May 4 02:01:20 UTC 2013


On Fri, May 3, 2013 at 10:18 AM, Richard Sharpe
<realrichardsharpe at gmail.com> wrote:
> On Fri, May 3, 2013 at 7:39 AM, Eric van Gyzen <eric at vangyzen.net> wrote:
>> On 05/02/2013 19:00, Richard Sharpe wrote:
>>> On Thu, May 2, 2013 at 7:52 AM, Eric van Gyzen <eric at vangyzen.net> wrote:
>>>> On 05/02/2013 08:48, Richard Sharpe wrote:
>>>>> On Wed, May 1, 2013 at 9:34 PM, Alfred Perlstein <bright at mu.org> wrote:
>>>>>> On 5/1/13 8:03 PM, Richard Sharpe wrote:
>>>>>>> Hi folks,
>>>>>>>
>>>>>>> I am checking to see if there are any known bugs with respect to this
>>>>>>> in FreeBSD 8.0.
>>>>>>>
>>>>>>> Situation is that Samba 3.6.6 uses writev to a non-blocking socket to
>>>>>>> get the SMB2 requests on the wire.
>>>>>>>
>>>>>>> Intermittently, we see the writev return EINVAL even though the data
>>>>>>> has gotten on the wire. This I have verified by grabbing a capture and
>>>>>>> comparing the SMB Sequence number in the last outgoing packet on the
>>>>>>> wire vs the in-memory contents when we get EINVAL.
>>>>>>>
>>>>>>> Sometimes it occurs on a four-element IOVEC, sometimes we get EAGAIN
>>>>>>> on the four-element IOVEC and then we get EINVAL when retrying on a
>>>>>>> smaller IOVEC.
>>>>>>>
>>>>>>> Where should I look to check if there is some path where this might be
>>>>>>> happening? Is this even the correct mailing list?
>>>>>>>
>>>>>> What does the iovec look like when you get EINVAL? Can you sanity check
>>>>>> it? Is there anything special about it? (zero length vecs?)
>>>>>>
>>>>>> I think there are a few "maxvals" that if overrun cause EINVAL to be
>>>>>> returned. example is if your iovec is somehow huge or has many, many
>>>>>> elements.
>>>>> Can anyone tell me the call graph down to the TCP code?
>>>>>
>>>> writev kern/sys_generic.c
>>>> kern_writev
>>>> dofilewrite
>>>> fo_write in sys/file.h
>>>> soo_write in kern/sys_socket.c
>>>> sosend in kern/uipc_socket.c
>>>> sosend_generic
>>>> tcp_usr_send in netinet/tcp_usrreq.c
>>> Is there a tool that generates call graphs?
>>
>> I'm not aware of one that works in the kernel--other than the kernel
>> itself, of course. With DDB compiled in, you could set a breakpoint on,
>> say, tcp_output, and show the call stack with bt.
>>
>> Also, take a look at stack(9).
>>
>>> I have been able to demonstrate that I am getting EINVAL returned from
>>>  writev kern/sys_generic.c, kern_writev, dofilewrite and soo_write,
>>> but when I add printfs to sosend/sosend_generic it becomes very hard
>>> to provoke this problem.
>>
>> So, either relocating code or changing the timing has changed the
>> behavior--a Heisenbug.
>>
>> If your code looks like this:
>>
>> if (error == EINVAL)
>> printf("you are here\n");
>>
>> You might add __predict_false, like this:
>>
>> if (__predict_false(error == EINVAL))
>> printf("you are here\n");
>>
>> That /might/ reduce the impact on runtime behavior.
>
> Thanks for that. The problem does not appear to be in the TCP or IP
> layers. Rather, it appears to be in the ixgbe driver.
>
> The problem takes a little more effort to provoke, but simple printfs
> are doing the job so far.

The version of the ixgbe driver we are using seems to set the max size
of a dma element to 65535 (IXGBE_TSO_SIZE)  and, even though large
numbers of iovecs are sent where the last element is 65536 bytes in
size, sometimes this causes EINVAL to be returned ...

-- 
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)


More information about the freebsd-net mailing list