read() returns ETIMEDOUT on steady TCP connection

Sat May 3 17:43:49 UTC 2008

Hi Andre,

Just to introduce myself, I am now helping Mark Hills with testing. 
Thank you for your suggestion, here are the results from a similar 
system (RELENG-7) with increasing
kern.ipc.nmbjumbop to 25600.

at 1600 streams using approx 340mbit, netstat  -m  was reporting

12550/250/12800/12800 4k (page size) jumbo clusters in use

After the read() returns ETIMEDOUT,

3857/10551/14408/25600 4k (page size) jumbo clusters in use

sysctl kern.ipc.nmbjumbop=25600 > 51200

After the read() returns ETIMEDOUT,

200/25400/25600/51200 4k (page size) jumbo clusters in use 
(current/cache/total/max)

netstat -m:

4140/26205/30345 mbufs in use (current/cache/total)
256/3482/3738/25600 mbuf clusters in use (current/cache/total/max)
256/3328 mbuf+clusters out of packet secondary zone in use (current/cache)
3882/21718/25600/51200 4k (page size) jumbo clusters in use 
(current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
17075K/100387K/117462K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/7/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

Do you think we need to reel out further sysctls and should I apply the 
patch to see if tcp_output: error 55  is still occuring ?

Thanks again, Tim

Andre Oppermann wrote:
> Mark Hills wrote:
>> On Wed, 23 Apr 2008, Andre Oppermann wrote:
>>
>>> http://people.freebsd.org/~andre/tcp_output-error-log.diff
>>>
>>> Please apply this patch and enable the sysctl net.inet.tcp.log_debug=1
>>> and report any output.  You likely get some (normal) noise from 
>>> syncache.
>>> What we are looking for is reports from tcp_output.
>>
>> Hi Andre, I've applied the patch and tested.
>>
>> Aside from syncache noise, I get a constant stream of 'error 55' 
>> (ENOBUFS?), once the number of connection gets to around 150 at 192kbps.
>>
>> TCP: [192.168.5.43]:52153 to [192.168.5.40]:8080; tcp_output: error 
>> 55 while sending
>>
>> 192.168.5.40 is the IP address of this host, running the server.
>>
>> I tried to correlate the point of the application receiving ETIMEDOUT 
>> with these messages, but that is tricky as it seems to be outputting 
>> a lot of messages, and multiple messages over eachother (see below).
>>
>> Because of the mention of no buffer space available, I checked the 
>> values of net.inet.tcp.sendbuf* and recvbuf*, and increased the max 
>> values with no effect.
>>
>> When I get time I will modify the kernel to print errors which aren't 
>> ENOBUFS to see if there are any others. But in the meantime, this 
>> sounds like a problem to me. Is that correct?
>>
>> Mark
>>
>>
>> :8080; tcp_output: error 55 while sending
>> TCP: [192.168.5.42]:57384T CtPo:  
>> [[119922..116688..55..4402]]::85048400;1  ttoc p[_1o9u2t.p1u6t8:. 
>> 5e.r4r0o]r: 8080;5 5t cwp_hoiultep uste:n deirnrgor 55 while sending
>> TCP: [192.168.5.42]:57382 to [192.168.5.40]:8080; tcp_output: error 
>> 55 while sending
>> TCP: [192.168.5.42]:57381 to [192.168.5.40]:8080; tcp_output: error 
>> 55 while sending
>> TCP: [192.168.5.42]:57380 to [192.168.5.40]:8080; tcp_output: error 
>> 55 while sending
>
> After tracing through the code it seems you are indeed memory limited.
> Looking back at the netstat -m output:
>
>  12550/250/12800/12800 4k (page size) jumbo clusters in use
>  (current/cache/total/max)
>  0/0/0 requests for jumbo clusters denied (4k/9k/16k)
>
> This shows that the supply of 4k jumbo clusters is pretty much exhausted.
> The cache may be allocated to different CPUs and the one making the 
> request
> at a given point may be depleted and can't get any from the global pool.
> The big question is why the denied counter doesn't report anything.  I've
> looked at the code paths and don't see any obvious reason why it doesn't
> get counted.  Maybe Robert can give some insight here.
>
> Try doubling the amount of 4k page size jumbo mbufs.  They are the 
> primary
> workhorse in the kernel right now:
>
>  sysctl kern.ipc.nmbjumbop=25600
>
> This should get further.  Still more may be necessary depending on 
> workloads.
>