No buffer space available

Thu Mar 20 13:39:24 PDT 2008

Hello,

Since moving over ftp traffic to a 6-STABLE from 9/20/2007 to a machine 
of ours, we've been getting the above errors in the logs.  Obviously the 
machine becomes unresponsive from the network and requires a console to 
log in and reboot.  I generally can fix these types of problems rather 
quickly (or thought I did), as I've handled these problems before in the 
past quite frequently.  However, this particular machine is giving me a 
really hard time.  I have to reboot the machine every 2ish weeks due to 
the above.   It's my hopes that after reading through the output that 
follows, someone can point out a crucial piece that I am 
missing.......cause I am stumped.

With the above said, and while looking through tons of output, I came 
across what I believe to 'be the gem'.  I'm hoping that this can be 
either confirmed or denied:

ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
mbuf_cluster: 2048, 64000, 1024, 10, 1024, 0                    ## While 
machine is borked
mbuf_cluster: 2048, 64000, 1532, 246, 1823214, 0            ## While the 
machine is not borked

The above is output from vmstat -z obviously trimmed just to show the 
specific lines.  The first line quite frankly makes no sense to me 
whatsoever.  In fact, it's ?artifically? stuck at 1024 for both the 
'used' and 'requests' fields.  Formatting bug? or integer overflow of 
some kind?  Maybe......but it's ironic that the network is locking up at 
the same time.  That and the values simply don't add up.

Additionally, I have netstat -m output that follows which again shows 
strange values for "requests for I/O initiated by sendfile" and "calls 
to protocol drain routines":
---------------------------------------------------------------------------
522/888/1410 mbufs in use (current/cache/total)
516/518/1034/64000 mbuf clusters in use (current/cache/total/max)
516/508 mbuf+clusters out of packet secondary zone in use (current/cache)
0/0/0/0 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/0 9k jumbo clusters in use (current/cache/total/max)
0/0/0/0 16k jumbo clusters in use (current/cache/total/max)
1162K/1258K/2420K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/5/8704 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines
---------------------------------------------------------------------------

0 ??  Those items mentioned above are counters which increment extremely 
slowly.  I can't imagine this ever being an integer rollover type of 
problem.  Something is weird here as well.

---------------------------------------------------------------------------

Lastly, em0 shows no errors to speak of:

Name    Mtu Network       Address              Ipkts Ierrs    Opkts 
Oerrs  Coll
em0    1500 <Link#1>      00:0e:0c:b1:a7:0e    23104     0    27905     
0     0
                                         01:00:5e:00:00:01      
744              0
---------------------------------------------------------------------------

Would certainly appreciate any help whether in the form of links, 
patches, or other non aggressive types of responses  ;)

Thanks,
Paul