Network memory allocation failures
Jeremy Chadwick
freebsd at jdc.parodius.com
Tue Sep 7 22:24:06 UTC 2010
On Tue, Sep 07, 2010 at 02:08:13PM -0700, Mahlon E. Smith wrote:
> I picked up a couple of Dell R810 monsters a couple of months ago. 96G
> of RAM, 24 core. With the aid of this list, got 8.1-RELEASE on there,
> and they are trucking along merrily as VirtualBox hosts.
>
> I'm seeing memory allocation errors when sending data over the network.
> It is random at best, however I can reproduce it pretty reliably.
>
> Sending 100M to a remote machine. Note the 2nd scp attempt worked.
> Most small files can make it through unmolested.
>
> obb# dd if=/dev/random of=100M-test bs=1M count=100
> 100+0 records in
> 100+0 records out
> 104857600 bytes transferred in 2.881689 secs (36387551 bytes/sec)
> obb# rsync -av 100M-test skin:/tmp/
> sending incremental file list
> 100M-test
> Write failed: Cannot allocate memory
> rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
> rsync: connection unexpectedly closed (28 bytes received so far) [sender]
> rsync error: unexplained error (code 255) at io.c(601) [sender=3.0.7]
> obb# scp 100M-test skin:/tmp/
> 100M-test 52% 52MB 52.1MB/s 00:00 ETAWrite failed: Cannot allocate memory
> lost connection
> obb# scp 100M-test skin:/tmp/
> 100M-test 100% 100MB 50.0MB/s 00:02
> obb# scp 100M-test skin:/tmp/
> 100M-test 0% 0 0.0KB/s --:-- ETAWrite failed: Cannot allocate memory
> lost connection
>
> Fetching a file, however, works.
>
> obb# scp skin:/usr/local/tmp/100M-test .
> 100M-test 100% 100MB 20.0MB/s 00:05
> obb# scp skin:/usr/local/tmp/100M-test .
> 100M-test 100% 100MB 20.0MB/s 00:05
> obb# scp skin:/usr/local/tmp/100M-test .
> 100M-test 100% 100MB 20.0MB/s 00:05
> obb# scp skin:/usr/local/tmp/100M-test .
> 100M-test 100% 100MB 20.0MB/s 00:05
> ...
>
>
> I've ruled out bad hardware (mainly due to the behavior being
> *identical* on the sister machine, in a completely different data
> center.) It's a broadcom (bce) NIC.
This could be a bce(4) bug, meaning the "failed to allocate memory"
message could be indicating DMA failure or something else from the card,
and not necessarily related to mbufs.
There are also changes/fixes to bce(4) that are in RELENG_8 (8.1-STABLE)
that aren't in 8.1-RELEASE, but I don't know if those are responsible
for your problem.
Please provide output from the following:
* uname -a (if desired, XXX out hostname)
* vmstat -i
* ifconfig -a (if desired, XXX out IPs and MACs)
* netstat -inbd (if desired, XXX out MACs)
* pciconf -lvc (only the bceX entry please)
Also check dmesg to see if there's any error messages that correlate
when the problem occurs.
I'm also CC'ing Yong-Hyeon PYUN who might have some ideas.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
More information about the freebsd-stable
mailing list