8-STABLE freezes on UDP traffic (DNS), 7.x doesn't

Attila Nagy bra at fsn.hu
Tue Mar 30 10:18:01 UTC 2010


Pyun YongHyeon wrote:
> On Mon, Mar 29, 2010 at 09:21:42PM +0200, Attila Nagy wrote:
>   
>> Pyun YongHyeon wrote:
>>     
>>> On Mon, Mar 29, 2010 at 12:57:59PM +0200, Attila Nagy wrote:
>>>   
>>>       
>>>> Hi,
>>>>
>>>> Michael Loftis wrote:
>>>>     
>>>>         
>>>>> --On Thursday, March 25, 2010 3:22 PM +0100 Attila Nagy <bra at fsn.hu>
>>>>> wrote:
>>>>>
>>>>> <...>
>>>>>       
>>>>>           
>>>>>> Both unbound and python accepts DNS requests, and it seems when 25%
>>>>>> interrupt happens, only unbound is in *udp state, where it is 50%, both
>>>>>> programs are in that state.
>>>>>>         
>>>>>>             
>>>>> Try turning of hardware TSO/checksum offload if it's availble on your
>>>>> chipset?  ifconfig <interface> -rxcsum -txcsum -tso -- I'm only using
>>>>> nfe chips right now, but w/ the TSO/CSUM on they lock up constantly
>>>>> under high load.  We're pretty sure it's mostly the nfe driver, or the
>>>>> chips themselves, but have never ruled out some generic 8.x hardware
>>>>> offload issues.
>>>>>       
>>>>>           
>>>> Bingo, this solved the problem. The current uptime nears four days.
>>>> Previously I couldn't go further than a day.
>>>>
>>>> The machine gets very light TCP load (and other machines which get work
>>>> well), so I guess it's UDP RX or TX checksum related.
>>>>
>>>>     
>>>>         
>>> Hmm, this is unexpected result. Since you're using UDP, TSO is not
>>> involved in this issue. Because you disabled RX/TX checksum
>>> offloading could you check how many number of 'bad checksum' and
>>> and 'no checksum' you have from netstat(1)?
>>> To narrow down which side of checksum offloading causes the issue,
>>> would you just disable one side in a time? For instance, disable TX
>>> checksum offloading with RX checksum offloading enabled and see how
>>> bce(4) works.
>>> #ifconfig bce0 -txcsum rxcsum
>>> If that shows the same issue, try disabling RX checksum offloading
>>> but enabling TX checksum offloading.
>>> #ifconfig bce0 txcsum -rxcsum
>>>   
>>>       
>> It's interesting. During the day, I've disabled only HW checksumming and
>> left TSO enabled. It couldn't run more than a few hours.
>> I have disabled tso again to see what happens.
>>
>> BTW, of course there is TCP traffic on that interface (DNS is also
>> available on TCP), maybe this causes the problem.
>>     
>
> The only guess I can think of at this moment is incorrect use of
> bus_dma(9) in TX path. But I'm not sure this is related with the
> issue you're seeing. Would you try the experimental patch at the
> following URL?
> http://people.freebsd.org/~yongari/bce/bce.20100305.diff
> Please make sure to back up your old bce(4) driver before applying
> the patch. I didn't see any abnormal things in testing but it
> wasn't much stressed.
>   
With the default settings (rx, tx csum, tso) it froze in about an hour:
CPU:  0.0% user,  0.0% nice,  0.0% system, 25.0% interrupt, 75.0% idle
  714 bind         4 102    0  1200M  1182M *lle    3  17:24  0.00% unbound



More information about the freebsd-stable mailing list