scp: Write Failed: Cannot allocate memory

Peter Ross Peter.Ross at bogen.in-berlin.de
Mon Jul 11 01:59:51 UTC 2011


Quoting "Scott Sipe" <cscotts at gmail.com>:

> On Wed, Jul 6, 2011 at 4:21 AM, Peter Ross  
> <Peter.Ross at bogen.in-berlin.de>wrote:
>
>> Quoting "Peter Ross" <Peter.Ross at bogen.in-berlin.de**>:
>>
>>  Quoting "Peter Ross" <Peter.Ross at bogen.in-berlin.de**>:
>>>
>>>  Quoting "Jeremy Chadwick" <freebsd at jdc.parodius.com>:
>>>>
>>>>  On Wed, Jul 06, 2011 at 01:54:12PM +1000, Peter Ross wrote:
>>>>>
>>>>>> Quoting "Jeremy Chadwick" <freebsd at jdc.parodius.com>:
>>>>>>
>>>>>>  On Wed, Jul 06, 2011 at 01:07:53PM +1000, Peter Ross wrote:
>>>>>>>
>>>>>>>> Quoting "Jeremy Chadwick" <freebsd at jdc.parodius.com>:
>>>>>>>>
>>>>>>>>  On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote:
>>>>>>>>>
>>>>>>>>>> Quoting "Jeremy Chadwick" <freebsd at jdc.parodius.com>:
>>>>>>>>>>
>>>>>>>>>>  On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I'm running virtualbox 3.2.12_1 if that has anything to do with
>>>>>>>>>>>> it.
>>>>>>>>>>>>
>>>>>>>>>>>> sysctl vfs.zfs.arc_max: 6200000000
>>>>>>>>>>>>
>>>>>>>>>>>> While I'm trying to scp, kstat.zfs.misc.arcstats.size is
>>>>>>>>>>>> hovering right around that value, sometimes above, sometimes
>>>>>>>>>>>> below (that's as it should be, right?). I don't think that it
>>>>>>>>>>>> dies when crossing over arc_max. I can run the same scp 10 times
>>>>>>>>>>>> and it might fail 1-3 times, with no correlation to the
>>>>>>>>>>>> arcstats.size being above/below arc_max that I can see.
>>>>>>>>>>>>
>>>>>>>>>>>> Scott
>>>>>>>>>>>>
>>>>>>>>>>>> On Jul 5, 2011, at 3:00 AM, Peter Ross wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>  Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> just as an addition: an upgrade to last Friday's
>>>>>>>>>>>>> FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the
>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I will experiment a bit more tomorrow after hours and grab
>>>>>>>>>>>>>
>>>>>>>>>>>> some statistics.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>
>>>>>>>>>>>>> Quoting "Peter Ross" <Peter.Ross at bogen.in-berlin.de**>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I noticed a similar problem last week. It is also very
>>>>>>>>>>>>>> similar to one reported last year:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-**
>>>>>>>>>>>>>> September/058708.html<http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My server is a Dell T410 server with the same bge card (the
>>>>>>>>>>>>>> same pciconf -lvc output as described by Mahlon:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-**
>>>>>>>>>>>>>> September/058711.html<http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711.html>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yours, Scott, is a em(4)..
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Another similarity: In all cases we are using VirtualBox. I
>>>>>>>>>>>>>> just want to mention it, in case it matters. I am still
>>>>>>>>>>>>>> running VirtualBox 3.2.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Most of the time kstat.zfs.misc.arcstats.size was reaching
>>>>>>>>>>>>>> vfs.zfs.arc_max then, but I could catch one or two cases
>>>>>>>>>>>>>> then the value was still below.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I added vfs.zfs.prefetch_disable=1 to sysctl.conf but it
>>>>>>>>>>>>>>
>>>>>>>>>>>>> does not help.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> BTW: It looks as ARC only gives back the memory when I
>>>>>>>>>>>>>> destroy the ZFS (a cloned snapshot containing virtual
>>>>>>>>>>>>>> machines). Even if nothing happens for hours the buffer
>>>>>>>>>>>>>> isn't released..
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My machine was still running 8.2-PRERELEASE so I am upgrading.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am happy to give information gathered on old/new kernel if it
>>>>>>>>>>>>>> helps.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Quoting "Scott Sipe" <cscotts at gmail.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Jul 2, 2011, at 12:54 AM, jhell wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'm running 8.2-RELEASE and am having new problems
>>>>>>>>>>>>>>>>>> with scp. When scping
>>>>>>>>>>>>>>>>>> files to a ZFS directory on the FreeBSD server --
>>>>>>>>>>>>>>>>>> most notably large files
>>>>>>>>>>>>>>>>>> -- the transfer frequently dies after just a few
>>>>>>>>>>>>>>>>>> seconds. In my last test, I
>>>>>>>>>>>>>>>>>> tried to scp an 800mb file to the FreeBSD system and
>>>>>>>>>>>>>>>>>> the transfer died after
>>>>>>>>>>>>>>>>>> 200mb. It completely copied the next 4 times I
>>>>>>>>>>>>>>>>>> tried, and then died again on
>>>>>>>>>>>>>>>>>> the next attempt.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On the client side:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "Connection to home closed by remote host.
>>>>>>>>>>>>>>>>>> lost connection"
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In /var/log/auth.log:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Jul  1 14:54:42 freebsd sshd[18955]: fatal: Write
>>>>>>>>>>>>>>>>>> failed: Cannot allocate
>>>>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've never seen this before and have used scp before
>>>>>>>>>>>>>>>>>> to transfer large files
>>>>>>>>>>>>>>>>>> without problems. This computer has been used in
>>>>>>>>>>>>>>>>>> production for months and
>>>>>>>>>>>>>>>>>> has a current uptime of 36 days. I have not been
>>>>>>>>>>>>>>>>>> able to notice any problems
>>>>>>>>>>>>>>>>>> copying files to the server via samba or netatalk, or
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> any problems in
>>>>>>>>>>
>>>>>>>>>>> apache.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Uname:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat
>>>>>>>>>>>>>>>>>> Feb 19 01:02:54 EST
>>>>>>>>>>>>>>>>>> 2011     root at xeon:/usr/obj/usr/src/**sys/GENERIC  amd64
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've attached my dmesg and output of vmstat -z.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have not restarted the sshd daemon or rebooted the
>>>>>>>>>>>>>>>>>> computer.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am glad to provide any other information or test anything
>>>>>>>>>>>>>>>>>> else.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> {snip vmstat -z and dmesg}
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You didn't provide details about your networking setup
>>>>>>>>>>>>>>>>> (rc.conf,
>>>>>>>>>>>>>>>>> ifconfig -a, etc.).  netstat -m would be useful too.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Next, please see this thread circa September 2010, titled
>>>>>>>>>>>>>>>>> "Network
>>>>>>>>>>>>>>>>> memory allocation failures":
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-**
>>>>>>>>>>>>>>>>> September/thread.html#58708<http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/thread.html#58708>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The user in that thread is using rsync, which relies on
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> scp by default.
>>>>>>>>>>
>>>>>>>>>>> I believe this problem is similar, if not identical, to yours.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please also provide your output of ( /usr/bin/limits -a )
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> for the server
>>>>>>>>>>
>>>>>>>>>>> end and the client.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am not quite sure I agree with the need for ifconfig -a but
>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>> information about the networking driver your using for the
>>>>>>>>>>>>>>>> interface
>>>>>>>>>>>>>>>> would be helpful, uptime of the boxes. And configuration
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> of the pool.
>>>>>>>>
>>>>>>>>> e.g. ( zpool status -a ;zfs get all <poolname> ) You should probably
>>>>>>>>>>>>>>>> prop this information up somewhere so you can reference by
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> URL whenever
>>>>>>>>>>
>>>>>>>>>>> needed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> rsync(1) does not rely on scp(1) whatsoever but rsync(1)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> can be made to
>>>>>>>>>>
>>>>>>>>>>> use ssh(1) instead of rsh(1) and I believe that is what Jeremy is
>>>>>>>>>>>>>>>> stating here but correct me if I am wrong. It does use ssh(1)
>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>> default.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Its a possiblity as well that if using tmpfs(5) or mdmfs(8)
>>>>>>>>>>>>>>>> for /tmp
>>>>>>>>>>>>>>>> type filesystems that rsync(1) may be just filling up your
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> temp ram area
>>>>>>>>>>
>>>>>>>>>>> and causing the connection abort which would be
>>>>>>>>>>>>>>>> expected. ( df -h ) would
>>>>>>>>>>>>>>>> help here.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm not using tmpfs/mdmfs at all. The clients yesterday
>>>>>>>>>>>>>>> were 3 different OSX computers (over gigabit). The FreeBSD
>>>>>>>>>>>>>>> server has 12gb of ram and no bce adapter. For what it's
>>>>>>>>>>>>>>> worth, the server is backed up remotely every night with
>>>>>>>>>>>>>>> rsync (remote FreeBSD uses rsync to pull) to an offsite
>>>>>>>>>>>>>>> (slow cable connection) FreeBSD computer, and I have not
>>>>>>>>>>>>>>> seen any errors in the nightly rsync.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sorry for the omission of networking info, here's the
>>>>>>>>>>>>>>> output of the requested commands and some that popped up
>>>>>>>>>>>>>>> in the other thread:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://www.cap-press.com/misc/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In rc.conf:  ifconfig_em1="inet 10.1.1.1 netmask 255.255.0.0"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Scott
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>> Just to make it crystal clear to everyone:
>>>>>>>>>>>
>>>>>>>>>>> There is no correlation between this problem and use of ZFS.
>>>>>>>>>>>  People are
>>>>>>>>>>> attempting to correlate "cannot allocate memory" messages with
>>>>>>>>>>> "anything
>>>>>>>>>>> on the system that uses memory".  The VM is much more complex than
>>>>>>>>>>> that.
>>>>>>>>>>>
>>>>>>>>>>> Given the nature of this problem, it's much more likely the issue
>>>>>>>>>>> is
>>>>>>>>>>> "somewhere" within a networking layer within FreeBSD, whether it
>>>>>>>>>>> be
>>>>>>>>>>> driver-level or some sort of intermediary layer.
>>>>>>>>>>>
>>>>>>>>>>> Two people who have this issue in this thread are both using
>>>>>>>>>>> VirtualBox.
>>>>>>>>>>> Can one, or both, of you remove VirtualBox from the configuration
>>>>>>>>>>> entirely (kernel, etc. -- not sure what is required) and then see
>>>>>>>>>>> if the
>>>>>>>>>>> issue goes away?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On the machine in question I only can do it after hours so I will
>>>>>>>>>> do
>>>>>>>>>> it tonight.
>>>>>>>>>>
>>>>>>>>>> I was _successfully_ sending the file over the loopback interface
>>>>>>>>>> using
>>>>>>>>>>
>>>>>>>>>> cat /zpool/temp/zimbra_oldroot.vdi | ssh localhost "cat >
>>>>>>>>>> /dev/null"
>>>>>>>>>>
>>>>>>>>>> I did it, btw, with the IPv6 localhost address first (accidently),
>>>>>>>>>> and then using IPv4. Both worked.
>>>>>>>>>>
>>>>>>>>>> It always fails if I am sending it through the bce(4) interface,
>>>>>>>>>> even if my target is the VirtualBox bridged to the bce card (so it
>>>>>>>>>> does not "leave" the computer physically).
>>>>>>>>>>
>>>>>>>>>> Below the uname -a, ifconfig -a, netstat -rn, pciconf -lv and
>>>>>>>>>> kldstat output.
>>>>>>>>>>
>>>>>>>>>> I have another box where I do not see that problem. It copies files
>>>>>>>>>> happily over the net using ssh.
>>>>>>>>>>
>>>>>>>>>> It is an an older HP ML 150 with 3GB RAM only but with a bge(4)
>>>>>>>>>> driver instead. It runs the same last week's RELENG_8. I installed
>>>>>>>>>> VirtualBox and enabled vboxnet (so it loads the kernel modules).
>>>>>>>>>> But
>>>>>>>>>> I do not run VirtualBox on it (because it hasn't enough RAM).
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>> DellT410one# uname -a
>>>>>>>>>> FreeBSD DellT410one.vv.fda 8.2-STABLE FreeBSD 8.2-STABLE #1: Thu
>>>>>>>>>> Jun
>>>>>>>>>> 30 17:07:18 EST 2011
>>>>>>>>>> root at DellT410one.vv.fda:/usr/**obj/usr/src/sys/GENERIC  amd64
>>>>>>>>>> DellT410one# ifconfig -a
>>>>>>>>>> bce0: flags=8943<UP,BROADCAST,**RUNNING,PROMISC,SIMPLEX,**
>>>>>>>>>> MULTICAST>
>>>>>>>>>> metric 0 mtu 1500
>>>>>>>>>>        options=c01bb<RXCSUM,TXCSUM,**
>>>>>>>>>> VLAN_MTU,VLAN_HWTAGGING,JUMBO_**MTU,VLAN_HWCSUM,TSO4,VLAN_**
>>>>>>>>>> HWTSO,LINKSTATE>
>>>>>>>>>>        ether 84:2b:2b:68:64:e4
>>>>>>>>>>        inet 192.168.50.220 netmask 0xffffff00 broadcast
>>>>>>>>>> 192.168.50.255
>>>>>>>>>>        inet 192.168.50.221 netmask 0xffffff00 broadcast
>>>>>>>>>> 192.168.50.255
>>>>>>>>>>        inet 192.168.50.223 netmask 0xffffff00 broadcast
>>>>>>>>>> 192.168.50.255
>>>>>>>>>>        inet 192.168.50.224 netmask 0xffffff00 broadcast
>>>>>>>>>> 192.168.50.255
>>>>>>>>>>        inet 192.168.50.225 netmask 0xffffff00 broadcast
>>>>>>>>>> 192.168.50.255
>>>>>>>>>>        inet 192.168.50.226 netmask 0xffffff00 broadcast
>>>>>>>>>> 192.168.50.255
>>>>>>>>>>        inet 192.168.50.227 netmask 0xffffff00 broadcast
>>>>>>>>>> 192.168.50.255
>>>>>>>>>>        inet 192.168.50.219 netmask 0xffffff00 broadcast
>>>>>>>>>> 192.168.50.255
>>>>>>>>>>        media: Ethernet autoselect (1000baseT <full-duplex>)
>>>>>>>>>>        status: active
>>>>>>>>>> bce1: flags=8802<BROADCAST,SIMPLEX,**MULTICAST> metric 0 mtu 1500
>>>>>>>>>>        options=c01bb<RXCSUM,TXCSUM,**
>>>>>>>>>> VLAN_MTU,VLAN_HWTAGGING,JUMBO_**MTU,VLAN_HWCSUM,TSO4,VLAN_**
>>>>>>>>>> HWTSO,LINKSTATE>
>>>>>>>>>>        ether 84:2b:2b:68:64:e5
>>>>>>>>>>        media: Ethernet autoselect
>>>>>>>>>> lo0: flags=8049<UP,LOOPBACK,**RUNNING,MULTICAST> metric 0 mtu
>>>>>>>>>> 16384
>>>>>>>>>>        options=3<RXCSUM,TXCSUM>
>>>>>>>>>>        inet6 fe80::1%lo0 prefixlen 64 scopeid 0xb
>>>>>>>>>>        inet6 ::1 prefixlen 128
>>>>>>>>>>        inet 127.0.0.1 netmask 0xff000000
>>>>>>>>>>        nd6 options=3<PERFORMNUD,ACCEPT_**RTADV>
>>>>>>>>>> vboxnet0: flags=8802<BROADCAST,SIMPLEX,**MULTICAST> metric 0 mtu
>>>>>>>>>> 1500
>>>>>>>>>>        ether 0a:00:27:00:00:00
>>>>>>>>>> DellT410one# netstat -rn
>>>>>>>>>> Routing tables
>>>>>>>>>>
>>>>>>>>>> Internet:
>>>>>>>>>> Destination        Gateway            Flags    Refs      Use  Netif
>>>>>>>>>> Expire
>>>>>>>>>> default            192.168.50.201     UGS         0    52195   bce0
>>>>>>>>>> 127.0.0.1          link#11            UH          0        6    lo0
>>>>>>>>>> 192.168.50.0/24    link#1             U           0  1118212
>>>>>>>>>> bce0
>>>>>>>>>> 192.168.50.219     link#1             UHS         0     9670    lo0
>>>>>>>>>> 192.168.50.220     link#1             UHS         0     8347    lo0
>>>>>>>>>> 192.168.50.221     link#1             UHS         0   103024    lo0
>>>>>>>>>> 192.168.50.223     link#1             UHS         0    43614    lo0
>>>>>>>>>> 192.168.50.224     link#1             UHS         0     8358    lo0
>>>>>>>>>> 192.168.50.225     link#1             UHS         0     8438    lo0
>>>>>>>>>> 192.168.50.226     link#1             UHS         0     8338    lo0
>>>>>>>>>> 192.168.50.227     link#1             UHS         0     8333    lo0
>>>>>>>>>> 192.168.165.0/24   192.168.50.200     UGS         0     3311
>>>>>>>>>> bce0
>>>>>>>>>> 192.168.166.0/24   192.168.50.200     UGS         0      699
>>>>>>>>>> bce0
>>>>>>>>>> 192.168.167.0/24   192.168.50.200     UGS         0     3012
>>>>>>>>>> bce0
>>>>>>>>>> 192.168.168.0/24   192.168.50.200     UGS         0      552
>>>>>>>>>> bce0
>>>>>>>>>>
>>>>>>>>>> Internet6:
>>>>>>>>>> Destination                       Gateway
>>>>>>>>>> Flags      Netif Expire
>>>>>>>>>> ::1                               ::1                           UH
>>>>>>>>>> lo0
>>>>>>>>>> fe80::%lo0/64                     link#11                       U
>>>>>>>>>> lo0
>>>>>>>>>> fe80::1%lo0                       link#11                       UHS
>>>>>>>>>> lo0
>>>>>>>>>> ff01::%lo0/32                     fe80::1%lo0                   U
>>>>>>>>>> lo0
>>>>>>>>>> ff02::%lo0/32                     fe80::1%lo0                   U
>>>>>>>>>> lo0
>>>>>>>>>> DellT410one# kldstat
>>>>>>>>>> Id Refs Address            Size     Name
>>>>>>>>>> 1   19 0xffffffff80100000 dbf5d0   kernel
>>>>>>>>>> 2    3 0xffffffff80ec0000 4c358    vboxdrv.ko
>>>>>>>>>> 3    1 0xffffffff81012000 131998   zfs.ko
>>>>>>>>>> 4    1 0xffffffff81144000 1ff1     opensolaris.ko
>>>>>>>>>> 5    2 0xffffffff81146000 2940     vboxnetflt.ko
>>>>>>>>>> 6    2 0xffffffff81149000 8e38     netgraph.ko
>>>>>>>>>> 7    1 0xffffffff81152000 153c     ng_ether.ko
>>>>>>>>>> 8    1 0xffffffff81154000 e70      vboxnetadp.ko
>>>>>>>>>> DellT410one# pciconf -lv
>>>>>>>>>> ..
>>>>>>>>>> bce0 at pci0:1:0:0:        class=0x020000 card=0x028d1028
>>>>>>>>>> chip=0x163b14e4 rev=0x20 hdr=0x00
>>>>>>>>>>  vendor     = 'Broadcom Corporation'
>>>>>>>>>>  class      = network
>>>>>>>>>>  subclass   = ethernet
>>>>>>>>>> bce1 at pci0:1:0:1:        class=0x020000 card=0x028d1028
>>>>>>>>>> chip=0x163b14e4 rev=0x20 hdr=0x00
>>>>>>>>>>  vendor     = 'Broadcom Corporation'
>>>>>>>>>>  class      = network
>>>>>>>>>>  subclass   = ethernet
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Could you please provide "pciconf -lvcb" output instead, specific to
>>>>>>>>> the
>>>>>>>>> bce chips?  Thanks.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Her it is:
>>>>>>>>
>>>>>>>> bce0 at pci0:1:0:0:        class=0x020000 card=0x028d1028
>>>>>>>> chip=0x163b14e4 rev=0x20 hdr=0x00
>>>>>>>>  vendor     = 'Broadcom Corporation'
>>>>>>>>  class      = network
>>>>>>>>  subclass   = ethernet
>>>>>>>>  bar   [10] = type Memory, range 64, base 0xda000000, size
>>>>>>>> 33554432, enabled
>>>>>>>>  cap 01[48] = powerspec 3  supports D0 D3  current D0
>>>>>>>>  cap 03[50] = VPD
>>>>>>>>  cap 05[58] = MSI supports 16 messages, 64 bit enabled with 1 message
>>>>>>>>  cap 11[a0] = MSI-X supports 9 messages in map 0x10
>>>>>>>>  cap 10[ac] = PCI-Express 2 endpoint max data 256(512) link x4(x4)
>>>>>>>> ecap 0003[100] = Serial 1 842b2bfffe6864e4
>>>>>>>> ecap 0001[110] = AER 1 0 fatal 0 non-fatal 1 corrected
>>>>>>>> ecap 0004[150] = unknown 1
>>>>>>>> ecap 0002[160] = VC 1 max VC0
>>>>>>>>
>>>>>>>
>>>>>>> Thanks Peter.
>>>>>>>
>>>>>>> Adding Yong-Hyeon and David to the discussion, since they've both
>>>>>>> worked
>>>>>>> on the bce(4) driver in recent months (most of the changes made
>>>>>>> recently
>>>>>>> are only in HEAD), and also adding Jack Vogel of Intel who maintains
>>>>>>> em(4).  Brief history for the devs:
>>>>>>>
>>>>>>> The issue is described "Network memory allocation failures" and was
>>>>>>> reported last year, but two users recently (Scott and Peter) have
>>>>>>> reported the issue again:
>>>>>>>
>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-**
>>>>>>> September/thread.html#58708<http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/thread.html#58708>
>>>>>>>
>>>>>>> And was mentioned again by Scott here, which also contains some
>>>>>>> technical details:
>>>>>>>
>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2011-**
>>>>>>> July/063172.html<http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063172.html>
>>>>>>>
>>>>>>> What's interesting is that Scott's issue is identical in form but he's
>>>>>>> using em(4), which isn't known to behave like this.  Both individuals
>>>>>>> are using VirtualBox, though we're not sure at this point if that is
>>>>>>> the
>>>>>>> piece which is causing the anomaly.
>>>>>>>
>>>>>>> Relevant details of Scott's system (em-based):
>>>>>>>
>>>>>>> http://www.cap-press.com/misc/
>>>>>>>
>>>>>>> Relevant details of Peter's system (bce-based):
>>>>>>>
>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2011-**
>>>>>>> July/063221.html<http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063221.html>
>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2011-**
>>>>>>> July/063223.html<http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063223.html>
>>>>>>>
>>>>>>> I think the biggest complexity right now is figuring out how/why scp
>>>>>>> fails intermittently in this nature.  The errno probably "trickles
>>>>>>> down"
>>>>>>> to userland from the kernel, but the condition regarding why it
>>>>>>> happens
>>>>>>> is unknown.
>>>>>>>
>>>>>>
>>>>>> BTW: I also saw 2 of the errors coming from a BIND9 running in a
>>>>>> jail on that box.
>>>>>>
>>>>>> DellT410one# fgrep -i allocate /jails/bind/20110315/var/log/**messages
>>>>>> Apr 13 05:17:41 bind named[23534]: internal_send:
>>>>>> 192.168.50.145#65176: Cannot allocate memory
>>>>>> Jun 21 23:30:44 bind named[39864]: internal_send:
>>>>>> 192.168.50.251#36155: Cannot allocate memory
>>>>>> Jun 24 15:28:00 bind named[39864]: internal_send:
>>>>>> 192.168.50.251#28651: Cannot allocate memory
>>>>>> Jun 28 12:57:52 bind named[2462]: internal_send:
>>>>>> 192.168.165.154#1201: Cannot allocate memory
>>>>>>
>>>>>> My initial guess: it happens sooner or later somehow - whether it is
>>>>>> a lot of traffic in one go (ssh/scp copies of virtual disks) or a
>>>>>> lot of traffic over a longer period (a nameserver gets asked again
>>>>>> and again).
>>>>>>
>>>>>
>>>>> Scott, are you also using jails?  If both of you are: is there any
>>>>> possibility you can remove use of those?  I'm not sure how VirtualBox
>>>>> fits into the picture (jails + VirtualBox that is), but I can imagine
>>>>> jails having different environmental constraints that might cause this.
>>>>>
>>>>> Basically the troubleshooting process here is to remove pieces of the
>>>>> puzzle until you figure out which piece is causing the issue.  I don't
>>>>> want to get the NIC driver devs all spun up for something that, for
>>>>> example, might be an issue with the jail implementation.
>>>>>
>>>>
>>>> I understand this. As said, I do some afterhours debugging tonight.
>>>>
>>>> The scp/ssh problems are happening _outside_ the jails. The bind runs
>>>> _inside_ the jail.
>>>>
>>>> I wanted to use the _host_ system to send VirtualBox virtual disks and
>>>>  filesystems used by jails to archive them and/or having them available on
>>>> other FreeBSD systems (as a cold standby solution).
>>>>
>>>
>>> I just switched off the VirtualBox (without removing the kernel modules).
>>>
>>> The copy succeeds now.
>>>
>>> Well, it could be a VirtualBox related problem, or is the server just
>>> relieved to have 2GB more memory at hands now?
>>>
>>> Do you have a quick idea to "emulate" the 2GB memory load usually
>>> delivered by VirtualBox?
>>>
>>
>> Well, managed that (using lookbusy)
>>
>> Interestingly I could copy a large file (30GB) without problems, as soon as
>> I switched off the VirtualBox. As said, the kernel modules weren't unloaded,
>> they are still there.
>>
>> The copy crashes seconds after I started the VirtualBox. According to
>> vmstat and top I had more free memory (ca. 1.5GB) as I had without
>> VirtualBox and lookbusy (ca. 350MB).
>>
>> So, it looks (to me, at least) as I have a VirtualBox related problem,
>> somehow.
>>
>> Any ideas? I am happy to play a bit more to get it sorted although it has
>> some limits (it is running the company mailserver, after all)
>>
>> Regards
>> Peter
>>
>
> This is it -- I'm seeing the exact same thing.
>
> Scp dies reliably with VirtualBox running. Quit VirtualBox and I was able to
> scp about 30 large files with no errors. Once I started VirtualBox an
> in-progress scp died within seconds.
>
> Ditto that the Kernel modules merely being loaded don't seem to make a
> difference, it's VirtualBox actually running.
>
> virtualbox-ose-3.2.12_1

Hi,

I wonder whether anyone has new ideas.

I am puzzled that it happens when VirtualBoxes are running, while the  
load or unload of the VirtualBox kernel modules doesn't seem to have  
an effect.

Should I describe the case at the -emulation mailing list to get some  
ideas from the engineers working on VirtualBox?

I do not want to create too much noise so I would like to know your  
thoughts on it first.

I experimented a little bit with the ssh code and know which write(2) in
/usr/src/crypto/openssh/roaming_common.c (in function roaming_write)  
returns the ENOMEM (an error it should never return, according to the  
mainpage;-)

but unfortunately I am lost to track it further down in the kernel. I  
do not know enough about it, to be frankly.

Are there any memory stats inside the kernel that could help?

Thank you for all ideas
Peter



More information about the freebsd-stable mailing list