Network memory allocation failures

Wed Sep 8 16:52:25 UTC 2010

On Wed, Sep 08, 2010 at 07:34:44AM -0700, Mahlon E. Smith wrote:
> On Tue, Sep 07, 2010, Jeremy Chadwick wrote:
> > 
> > I figured there might memory exhaustion of sorts, possibly in the bce(4)
> > driver itself, that could cause the OP's problem.  bce(4) might not be
> > the problem at all.  But the OP's issue seems to only occur when
> > transmitting data, not receiving:
> > 
> > http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html
> 
> More information:
> 
> Looks like 100M wasn't enough of a test burst to tickle the problem in
> my original message... 10G is, though.  It's definitely happening in
> both directions.
> 
> Upgraded to -STABLE on one of the two machines last night, running
> GENERIC.
> 
> FreeBSD obb 8.1-STABLE FreeBSD 8.1-STABLE #0: Tue Sep  7 19:48:55 PDT 2010     root at obb:/usr/obj/usr/src/sys/GENERIC  amd64
> 
> 
> Outgoing:
> 
>     obb# scp testfile root at holp:/usr/local/tmp/
>     testfile                            8%  856MB  37.6MB/s   04:09 ETA
>     Write failed: Cannot allocate memory
>     lost connection
>     obb# scp testfile root at holp:/usr/local/tmp/
>     testfile                            0%   72MB  34.3MB/s   04:56 ETA
>     Write failed: Cannot allocate memory
>     lost connection
> 
> Incoming:
> 
>     obb# scp root at holp:/usr/local/tmp/testfile .
>     testfile                            6%  670MB  31.9MB/s   04:59 ETA
>     Write failed: Cannot allocate memory
>     lost connection
>     obb# scp root at holp:/usr/local/tmp/testfile .
>     testfile                            1%  118MB  39.3MB/s   04:17 ETA
>     Write failed: Cannot allocate memory
>     lost connection
>     obb# scp root at holp:/usr/local/tmp/testfile .
>     testfile                           15% 1613MB  29.0MB/s   04:57 ETA
>     Write failed: Cannot allocate memory
>     lost connection
> 

I think bce(4) may not be able to return ENOMEM to user land
process so I guess it's not a bce(4) issue. To rule out possible
driver issue, could you try other controller instead of bce(4)?

> 
> 
> > The 2nd-to-last paragraph there is worth noting, specifically how
> > limiting maximum addressable memory to 32GB via loader.conf seems to
> > work around the issue.
> 
> I'd no longer consider this a coincidence, limiting the memory to 16G
> eliminates the issue completely.  I'll retest with 32G today.
> 

Again, this type of change has nothing to do with driver operation.
bce(4) may have some issues on PAE but I don't think that would
trigger problems on amd64 systems.

> Incoming:
> 
>     obb# scp root at holp:/usr/local/tmp/testfile testfile2
>     testfile                    100%   10GB  17.8MB/s   09:35
>     obb# scp root at holp:/usr/local/tmp/testfile testfile2
>     testfile                    100%   10GB  17.0MB/s   10:02
> 
> Outgoing:
> 
>     obb# scp testfile root at holp:/usr/local/tmp/testfile2
>     testfile                    100%   10GB  35.7MB/s   04:47
>     obb# scp testfile root at holp:/usr/local/tmp/testfile2
>     testfile                    100%   10GB  35.4MB/s   04:49
> 
>  
> > There were other problems with the systems in question back in July, it
> > seems.  I assume these got hammered out somehow:
> > 
> > http://www.mail-archive.com/freebsd-stable@freebsd.org/msg111408.html
> 
> To a degree -- the initial install and cpu count problems are all fixed
> up, thanks to help from the list.  The Intel 10G panics were stifled
> with a newer driver from Intel's site, but I ran out of time to do
> any serious testing with it, and just ended up using the broadcoms to
> satisfy my time constraint.
> 
> --
> Mahlon E. Smith  
> http://www.martini.nu/contact.html