i386/134926: FreeBSD-current Xen DomU networking panic - out of tx mbufs?

Adrian Chadd adrian at FreeBSD.org
Mon May 25 05:20:01 UTC 2009


>Number:         134926
>Category:       i386
>Synopsis:       FreeBSD-current Xen DomU networking panic - out of tx mbufs?
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-i386
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon May 25 05:20:00 UTC 2009
>Closed-Date:
>Last-Modified:
>Originator:     Adrian Chadd
>Release:        FreeBSD-current r192286
>Organization:
>Environment:
FreeBSD-current 8.0, r192286, i386, Xen.
>Description:
I've setup two Xen DomUs - one 6.3-REL Xen, one -current Xen, and am simulating some basic HTTP benchmarking.

I'm -specifically- trying to hit points where there's mbuf and CPU use exhaustion.

The -current Xen crashes after a few minutes of load. Watching some of my debugging, it seems to be triggered most frequently when the ab finishes running.

The crash is in sys/dev/xen/netfront/netfront.c:get_id_from_freelist(). The TX mbuf list is 100% allocated, including slot 0 which is "special", and a call to get another buffer causes a crash. Slot 0 is meant to be the head pointer to the "freelist" of free mbufs using pointer values < KERNBASE to represent offsets in the TX mbuf array. Eww.

Breaking out of the TX loop if the only free entry is slot 0 (thus not completely filling the TX mbuf list) doesn't stop the crashing; things crash elsewhere. I'm not sure what is going on, but I'm seeing Xen netback slots with IDs which point to free'd mbuf slots pop up. This message also shows up:

network_tx_buf_gc: warning -- grant still in use by backend domain.

My guess is that the netfront driver really doesn't handle memory exhaustion/ring exhaustion very well. 

I'm not sure which version of the Linux netfront code this was based off (and whether the Linux folk have fixed it subsequently) but it may be worth a look.
>How-To-Repeat:
The 6.3-REL xen runs my squid fork (lusca); the -current Xen runs my apachebench fork with libevent support. They're doing around 400-500 small requests a second, so I'd guess around 2500-4500 little packets a second are flying between these DomUs. They are both on the same host (CentOS 5.3 i386, Xen 3.1.0-magic-RHEL-hackery, etc.) Both VMs have 128meg of RAM and use the default kernel settings.

The apachebench command line is "ab -n 10000 -c 200 [url to squid]" sitting in a loop. That's 10,000 requests across 200 concurrent connections, no keepalive (so one request == new TCP session.)


>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-i386 mailing list