Network memory allocation failures

Jeremy Chadwick freebsd at jdc.parodius.com
Tue Sep 7 22:24:06 UTC 2010


On Tue, Sep 07, 2010 at 02:08:13PM -0700, Mahlon E. Smith wrote:
> I picked up a couple of Dell R810 monsters a couple of months ago.  96G
> of RAM, 24 core.  With the aid of this list, got 8.1-RELEASE on there,
> and they are trucking along merrily as VirtualBox hosts.
> 
> I'm seeing memory allocation errors when sending data over the network.
> It is random at best, however I can reproduce it pretty reliably.
> 
> Sending 100M to a remote machine.  Note the 2nd scp attempt worked.
> Most small files can make it through unmolested.
> 
>     obb# dd if=/dev/random of=100M-test bs=1M count=100
>     100+0 records in
>     100+0 records out
>     104857600 bytes transferred in 2.881689 secs (36387551 bytes/sec)
>     obb# rsync -av 100M-test skin:/tmp/
>     sending incremental file list
>     100M-test
>     Write failed: Cannot allocate memory
>     rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
>     rsync: connection unexpectedly closed (28 bytes received so far) [sender]
>     rsync error: unexplained error (code 255) at io.c(601) [sender=3.0.7]
>     obb# scp 100M-test skin:/tmp/
>     100M-test        52%   52MB  52.1MB/s   00:00 ETAWrite failed: Cannot allocate memory
>     lost connection
>     obb# scp 100M-test skin:/tmp/
>     100M-test       100%  100MB  50.0MB/s   00:02    
>     obb# scp 100M-test skin:/tmp/
>     100M-test         0%    0     0.0KB/s   --:-- ETAWrite failed: Cannot allocate memory
>     lost connection
> 
> Fetching a file, however, works.
> 
>     obb# scp skin:/usr/local/tmp/100M-test .
>     100M-test    100%  100MB  20.0MB/s   00:05    
>     obb# scp skin:/usr/local/tmp/100M-test .
>     100M-test    100%  100MB  20.0MB/s   00:05    
>     obb# scp skin:/usr/local/tmp/100M-test .
>     100M-test    100%  100MB  20.0MB/s   00:05    
>     obb# scp skin:/usr/local/tmp/100M-test .
>     100M-test    100%  100MB  20.0MB/s   00:05    
>     ...
> 
> 
> I've ruled out bad hardware (mainly due to the behavior being
> *identical* on the sister machine, in a completely different data
> center.) It's a broadcom (bce) NIC.

This could be a bce(4) bug, meaning the "failed to allocate memory"
message could be indicating DMA failure or something else from the card,
and not necessarily related to mbufs.

There are also changes/fixes to bce(4) that are in RELENG_8 (8.1-STABLE)
that aren't in 8.1-RELEASE, but I don't know if those are responsible
for your problem.

Please provide output from the following:

* uname -a        (if desired, XXX out hostname)
* vmstat -i
* ifconfig -a     (if desired, XXX out IPs and MACs)
* netstat -inbd   (if desired, XXX out MACs)
* pciconf -lvc    (only the bceX entry please)

Also check dmesg to see if there's any error messages that correlate
when the problem occurs.

I'm also CC'ing Yong-Hyeon PYUN who might have some ideas.

-- 
| Jeremy Chadwick                                   jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |



More information about the freebsd-stable mailing list