svn commit: r231759 - in stable/8: share/man/man4 sys/amd64/conf sys/conf sys/dev/acpica sys/dev/esp sys/dev/twa sys/dev/xen/balloon sys/dev/xen/blkback sys/dev/xen/blkfront sys/dev/xen/console sys...

Kenneth D. Merry ken at FreeBSD.org
Wed Feb 15 14:23:02 UTC 2012


Author: ken
Date: Wed Feb 15 14:23:01 2012
New Revision: 231759
URL: http://svn.freebsd.org/changeset/base/231759

Log:
  MFC r215818, r216405, r216437, r216448, r216956, r221827, r222975, r223059,
  r225343, r225704, r225705, r225706, r225707, r225709, r226029, r220647,
  r230183, r230587, r230916, r228526, r230879:
  
  Bring Xen support in stable/8 up to parity with head.  Almost all
  outstanding Xen support differences between head and stable/8 are included,
  except for the just added r231743.
  
    r215818 | cperciva | 2010-11-25 08:05:21 -0700 (Thu, 25 Nov 2010) | 5 lines
  
    Rename HYPERVISOR_multicall (which performs the multicall hypercall) to
    _HYPERVISOR_multicall, and create a new HYPERVISOR_multicall function which
    invokes _HYPERVISOR_multicall and checks that the individual hypercalls all
    succeeded.
  
    r216405 | rwatson | 2010-12-13 05:15:46 -0700 (Mon, 13 Dec 2010) | 7 lines
  
    Add options NO_ADAPTIVE_SX to the XENHVM kernel configuration, matching
    its similar disabling of adaptive mutexes and rwlocks.  The existing
    comment on why this is the case also applies to sx locks.
  
    MFC after:	3 days
    Discussed with:	attilio
  
    r216437 | gibbs | 2010-12-14 10:23:49 -0700 (Tue, 14 Dec 2010) | 2 lines
  
    Remove spurious printf left over from debugging our XenStore support.
  
    r216448 | gibbs | 2010-12-14 13:57:40 -0700 (Tue, 14 Dec 2010) | 4 lines
  
    Fix a typo in a comment.
  
    Noticed by:	Attila Nagy <bra at fsn.hu>
  
    r216956 | rwatson | 2011-01-04 07:49:54 -0700 (Tue, 04 Jan 2011) | 8 lines
  
    Make "options XENHVM" compile for i386, not just amd64 -- a largely
    mechanical change.  This opens the door for using PV device drivers
    under Xen HVM on i386, as well as more general harmonisation of i386
    and amd64 Xen support in FreeBSD.
  
    Reviewed by:	cperciva
    MFC after:	3 weeks
  
    r221827 | mav | 2011-05-12 21:40:16 -0600 (Thu, 12 May 2011) | 2 lines
  
    Fix msleep() usage in Xen balloon driver to not wake up on every HZ tick.
  
    r222975 | gibbs | 2011-06-10 22:59:01 -0600 (Fri, 10 Jun 2011) | 63 lines
  
    Monitor and emit events for XenStore changes to XenBus trees
    of the devices we manage.  These changes can be due to writes
    we make ourselves or due to changes made by the control domain.
    The goal of these changes is to insure that all state transitions
    can be detected regardless of their source and to allow common
    device policies (e.g. "onlined" backend devices) to be centralized
    in the XenBus bus code.
  
    sys/xen/xenbus/xenbusvar.h:
    sys/xen/xenbus/xenbus.c:
    sys/xen/xenbus/xenbus_if.m:
    	Add a new method for XenBus drivers "localend_changed".
    	This method is invoked whenever a write is detected to
    	a device's XenBus tree.  The default implementation of
    	this method is a no-op.
  
    sys/xen/xenbus/xenbus_if.m:
    sys/dev/xen/netfront/netfront.c:
    sys/dev/xen/blkfront/blkfront.c:
    sys/dev/xen/blkback/blkback.c:
    	Change the signature of the "otherend_changed" method.
    	This notification cannot fail, so it should return void.
  
    sys/xen/xenbus/xenbusb_back.c:
    	Add "online" device handling to the XenBus Back Bus
    	support code.  An online backend device remains active
    	after a front-end detaches as a reconnect is expected
    	to occur in the near future.
  
    sys/xen/interface/io/xenbus.h:
    	Add comment block further explaining the meaning and
    	driver responsibilities associated with the XenBus
    	Closed state.
  
    sys/xen/xenbus/xenbusb.c:
    sys/xen/xenbus/xenbusb.h:
    sys/xen/xenbus/xenbusb_back.c:
    sys/xen/xenbus/xenbusb_front.c:
    sys/xen/xenbus/xenbusb_if.m:
    	o Register a XenStore watch against the local XenBus tree
    	  for all devices.
    	o Cache the string length of the path to our local tree.
    	o Allow the xenbus front and back drivers to hook/filter both
    	  local and otherend watch processing.
    	o Update the device ivar version of "state" when we detect
    	  a XenStore update of that node.
  
    sys/dev/xen/control/control.c:
    sys/xen/xenbus/xenbus.c:
    sys/xen/xenbus/xenbusb.c:
    sys/xen/xenbus/xenbusb.h:
    sys/xen/xenbus/xenbusvar.h:
    sys/xen/xenstore/xenstorevar.h:
    	Allow clients of the XenStore watch mechanism to attach
    	a single uintptr_t worth of client data to the watch.
    	This removes the need to carefully place client watch
    	data within enclosing objects so that a cast or offsetof
    	calculation can be used to convert from watch to enclosing
    	object.
  
    Sponsored by:	Spectra Logic Corporation
    MFC after:	1 week
  
    r223059 | gibbs | 2011-06-13 14:36:29 -0600 (Mon, 13 Jun 2011) | 36 lines
  
    Several enhancements to the Xen block back driver.
  
    sys/dev/xen/blkback/blkback.c:
    	o Implement front-end request coalescing.  This greatly improves the
    	  performance of front-end clients that are unaware of the dynamic
    	  request-size/number of requests negotiation available in the
    	  FreeBSD backend driver.  This required a large restructuring
    	  in how this driver records in-flight transactions and how those
    	  transactions are mapped into kernel KVA.  For example, the driver
    	  now includes a mini "KVA manager" that allocates ranges of
    	  contiguous KVA to patches of requests that are physically
    	  contiguous in the backing store so that a single bio or UIO
    	  segment can be used to represent the I/O.
  
    	o Refuse to open any backend files or devices if the system
    	  has yet to mount root.  This avoids a panic.
  
    	o Properly handle "onlined" devices.  An "onlined" backend
    	  device stays attached to its backing store across front-end
    	  disconnections.  This feature is intended to reduce latency
    	  when a front-end does a hand-off to another driver (e.g.
    	  PV aware bootloader to OS kernel) or during a VM reboot.
  
    	o Harden the driver against a pathological/buggy front-end
    	  by carefully vetting front-end XenStore data such as the
    	  front-end state.
  
    	o Add sysctls that report the negotiated number of
    	  segments per-request and the number of requests that
    	  can be concurrently in flight.
  
    Submitted by:	kdm
    Reviewed by:	gibbs
    Sponsored by:	Spectra Logic Corporation
    MFC after:	1 week
  
    r225343 | rwatson | 2011-09-02 11:36:01 -0600 (Fri, 02 Sep 2011) | 7 lines
  
    Add support for alternative break-to-debugger support on the Xen console.
    This should help debug boot-time hangs experienced in 9.0-BETA.
  
    MFC after:	3 weeks
    Tested by:	sbruno
    Approved by:	re (kib)
  
    r225704 | gibbs | 2011-09-20 17:44:34 -0600 (Tue, 20 Sep 2011) | 29 lines
  
    Properly handle suspend/resume events in the Xen device framework.
  
    Sponsored by:	BQ Internet
  
    sys/xen/xenbus/xenbusb.c:
    	o In xenbusb_resume(), publish the state transition of the
    	  resuming device into XenbusStateIntiailising so that the
    	  remote peer can see it.  Recording the state locally is
    	  not sufficient to trigger a re-connect sequence.
    	o In xenbusb_resume(), defer new-bus resume processing until
    	  after the remote peer's XenStore address has been updated.
    	  The drivers may need to refer to this information during
    	  resume processing.
  
    sys/xen/xenbus/xenbusb_back.c:
    sys/xen/xenbus/xenbusb_front.c:
    	Register xenbusb_resume() rather than bus_generic_resume()
    	as the handler for device_resume events.
  
    sys/xen/xenstore/xenstore.c:
    	o Fix grammer in a comment.
    	o In xs_suspend(), pass suspend events on to the child
    	  devices (e.g. xenbusb_front/back, that are attached
    	  to the XenStore.
  
    Approved by:	re
    MFC after:	1 week
  
    r225705 | gibbs | 2011-09-20 18:02:44 -0600 (Tue, 20 Sep 2011) | 35 lines
  
    Add suspend/resume support to the Xen blkfront driver.
  
    Sponsored by:	BQ Internet
  
    sys/dev/xen/blkfront/block.h:
    sys/dev/xen/blkfront/blkfront.c:
    	Remove now unused blkif_vdev_t from the blkfront soft.
  
    sys/dev/xen/blkfront/blkfront.c:
    	o In blkfront_suspend(), indicate the desire to suspend
    	  by changing the softc connected state to SUSPENDED, and
    	  then wait for any I/O pending on the remote peer to
    	  drain.  Cancel suspend processing if I/O does not
    	  drain within 30 seconds.
    	o Enable and update blkfront_resume().  Since I/O is
    	  drained prior to the suspension of the VM, the complicated
    	  recovery process performed by other Xen blkfront
    	  implementations is avoided.  We simply tear down the
    	  connection to our old peer, and then re-connect.
    	o In blkif_initialize(), fix a resource leak and botched
    	  return if we cannot allocate shadow memory for our
    	  requests.
    	o In blkfront_backend_changed(), correct our response to
    	  the XenbusStateInitialised state.  This state indicates
    	  that our backend peer has published sufficient data for
    	  blkfront to publish ring information and other XenStore
    	  data, not that a connection can occur.  Blkfront now
    	  will only perform connection processing in response to
    	  the XenbusStateConnected state.  This corrects an issue
    	  where blkfront connected before the backend was ready
    	  during resume processing.
  
    Approved by:	re
    MFC after:	1 week
  
    r225706 | gibbs | 2011-09-20 18:06:02 -0600 (Tue, 20 Sep 2011) | 11 lines
  
    [ Forced commit.  Actual changes accidentally included in r225704 ]
  
    sys/dev/xen/control/control.c:
    	Fix locking violations in Xen HVM suspend processing
    	and have it perform similar actions to those performed
    	during an ACPI triggered suspend.
  
    Sponsored by:	BQ Internet
    Approved by:	re
    MFC after:	1 week
  
    r225707 | gibbs | 2011-09-20 18:08:25 -0600 (Tue, 20 Sep 2011) | 21 lines
  
    Correct suspend/resume support in the Netfront driver.
  
    Sponsored by:	BQ Internet
  
    sys/dev/xen/netfront/netfront.c:
    	o Implement netfront_suspend(), a specialized suspend
    	  handler for the netfront driver.  This routine simply
    	  disables the carrier so the driver is idle during
    	  system suspend processing.
    	o Fix a leak when re-initializing LRO during a link reset.
    	o In netif_release_tx_bufs(), when cleaning up the grant
    	  references for our TX ring, use gnttab_end_foreign_access_ref
    	  instead of attempting to grant the page again.
    	o In netif_release_tx_bufs(), we do not track mbufs associated
    	  with mbuf chains, but instead just free each mbuf directly.
    	  Use m_free(), not m_freem(), to avoid double frees of mbufs.
    	o Refactor some code to enhance clarity.
  
    Approved by:	re
    MFC after:	1 week
  
    r225709 | gibbs | 2011-09-20 18:15:29 -0600 (Tue, 20 Sep 2011) | 19 lines
  
    Update netfront so that it queries and honors published
    back-end features.
  
    sys/dev/xen/netfront/netfront.c:
    	o Add xn_query_features() which reads the XenStore and
    	  records the TSO, LRO, and chained ring-request support
    	  of the backend.
    	o Rename xn_configure_lro() to xn_configure_features() and
    	  use this routine to manage the setup of TSO, LRO, and
    	  checksum offload.
    	o In create_netdev(), initialize if_capabilities and
    	  if_hwassist to the capabilities found on all backends.
    	  Delegate configuration of if_capenable and the TSO flag
    	  if if_hwassist to xn_configure_features().
  
    Reported by:	Hugo Silva (fix inspired by patch provided)
    Approved by:	re
    MFC after:	1 week
  
    r226029 | jkim | 2011-10-04 17:53:47 -0600 (Tue, 04 Oct 2011) | 2 lines
  
    Add strnlen() to libkern.
  
    r220647 | jkim | 2011-04-14 16:17:39 -0600 (Thu, 14 Apr 2011) | 4 lines
  
    Add event handlers for (ACPI) suspend/resume events.  Suspend event handlers
    are invoked right before device drivers go into sleep state and resume event
    handlers are invoked right after all device drivers are waken up.
  
    r230183 | cperciva | 2012-01-15 19:38:45 -0700 (Sun, 15 Jan 2012) | 3 lines
  
    Make XENHVM work on i386.  The __ffs() function counts bits starting from
    zero, unlike ffs(3), which starts counting from 1.
  
    r230587 | ken | 2012-01-26 09:35:09 -0700 (Thu, 26 Jan 2012) | 38 lines
  
    Xen netback driver rewrite.
  
    share/man/man4/Makefile,
    share/man/man4/xnb.4,
    sys/dev/xen/netback/netback.c,
    sys/dev/xen/netback/netback_unit_tests.c:
  
    	Rewrote the netback driver for xen to attach properly via newbus
    	and work properly in both HVM and PVM mode (only HVM is tested).
    	Works with the in-tree FreeBSD netfront driver or the Windows
    	netfront driver from SuSE.  Has not been extensively tested with
    	a Linux netfront driver.  Does not implement LRO, TSO, or
    	polling.  Includes unit tests that may be run through sysctl
    	after compiling with XNB_DEBUG defined.
  
    sys/dev/xen/blkback/blkback.c,
    sys/xen/interface/io/netif.h:
  
    	Comment elaboration.
  
    sys/kern/uipc_mbuf.c:
  
    	Fix page fault in kernel mode when calling m_print() on a
    	null mbuf.  Since m_print() is only used for debugging, there
    	are no performance concerns for extra error checking code.
  
    sys/kern/subr_scanf.c:
  
    	Add the "hh" and "ll" width specifiers from C99 to scanf().
    	A few callers were already using "ll" even though scanf()
    	was handling it as "l".
  
    Submitted by:	Alan Somers <alans at spectralogic.com>
    Submitted by:	John Suykerbuyk <johns at spectralogic.com>
    Sponsored by:	Spectra Logic
    MFC after:	1 week
    Reviewed by:	ken
  
    r230916 | ken | 2012-02-02 10:54:35 -0700 (Thu, 02 Feb 2012) | 13 lines
  
    Fix the netback driver build for i386.
  
    netback.c:	Add missing VM includes.
  
    xen/xenvar.h,
    xen/xenpmap.h:	Move some XENHVM macros from <machine/xen/xenpmap.h> to
    		<machine/xen/xenvar.h> on i386 to match the amd64 headers.
  
    conf/files:	Add netback to the build.
  
    Submitted by:	jhb
    MFC after:	3 days
  
    r228526 | kevlo | 2011-12-14 23:29:13 -0700 (Wed, 14 Dec 2011) | 2 lines
  
    s/timout/timeout
  
    r230879 | ken | 2012-02-01 13:19:33 -0700 (Wed, 01 Feb 2012) | 4 lines
  
    Add the GSO prefix descriptor define.
  
    MFC after:	3 days

Added:
  stable/8/share/man/man4/xnb.4
     - copied unchanged from r230587, head/share/man/man4/xnb.4
  stable/8/sys/dev/xen/netback/netback_unit_tests.c
     - copied unchanged from r230587, head/sys/dev/xen/netback/netback_unit_tests.c
  stable/8/sys/libkern/strnlen.c
     - copied unchanged from r226029, head/sys/libkern/strnlen.c
Modified:
  stable/8/share/man/man4/Makefile
  stable/8/sys/amd64/conf/XENHVM
  stable/8/sys/conf/files
  stable/8/sys/dev/acpica/acpi.c
  stable/8/sys/dev/esp/ncr53c9x.c
  stable/8/sys/dev/twa/tw_osl.h
  stable/8/sys/dev/xen/balloon/balloon.c
  stable/8/sys/dev/xen/blkback/blkback.c
  stable/8/sys/dev/xen/blkfront/blkfront.c
  stable/8/sys/dev/xen/blkfront/block.h
  stable/8/sys/dev/xen/console/console.c
  stable/8/sys/dev/xen/control/control.c
  stable/8/sys/dev/xen/netback/netback.c
  stable/8/sys/dev/xen/netfront/netfront.c
  stable/8/sys/dev/xen/xenpci/evtchn.c
  stable/8/sys/i386/include/pcpu.h
  stable/8/sys/i386/include/pmap.h
  stable/8/sys/i386/include/xen/hypercall.h
  stable/8/sys/i386/include/xen/xen-os.h
  stable/8/sys/i386/include/xen/xenpmap.h
  stable/8/sys/i386/include/xen/xenvar.h
  stable/8/sys/i386/xen/xen_machdep.c
  stable/8/sys/kern/subr_scanf.c
  stable/8/sys/kern/uipc_mbuf.c
  stable/8/sys/sys/eventhandler.h
  stable/8/sys/sys/libkern.h
  stable/8/sys/xen/interface/io/netif.h
  stable/8/sys/xen/interface/io/xenbus.h
  stable/8/sys/xen/xenbus/xenbus.c
  stable/8/sys/xen/xenbus/xenbus_if.m
  stable/8/sys/xen/xenbus/xenbusb.c
  stable/8/sys/xen/xenbus/xenbusb.h
  stable/8/sys/xen/xenbus/xenbusb_back.c
  stable/8/sys/xen/xenbus/xenbusb_front.c
  stable/8/sys/xen/xenbus/xenbusb_if.m
  stable/8/sys/xen/xenbus/xenbusvar.h
  stable/8/sys/xen/xenstore/xenstore.c
  stable/8/sys/xen/xenstore/xenstorevar.h
Directory Properties:
  stable/8/   (props changed)
  stable/8/share/   (props changed)
  stable/8/share/man/   (props changed)
  stable/8/share/man/man4/   (props changed)
  stable/8/sys/   (props changed)

Modified: stable/8/share/man/man4/Makefile
==============================================================================
--- stable/8/share/man/man4/Makefile	Wed Feb 15 13:40:10 2012	(r231758)
+++ stable/8/share/man/man4/Makefile	Wed Feb 15 14:23:01 2012	(r231759)
@@ -508,6 +508,7 @@ MAN=	aac.4 \
 	${_xen.4} \
 	xhci.4 \
 	xl.4 \
+	${_xnb.4} \
 	xpt.4 \
 	zero.4 \
 	zyd.4
@@ -696,6 +697,7 @@ _urtw.4=	urtw.4
 _viawd.4=	viawd.4
 _wpi.4=		wpi.4
 _xen.4=		xen.4
+_xnb.4=		xnb.4
 
 MLINKS+=lindev.4 full.4
 .endif

Copied: stable/8/share/man/man4/xnb.4 (from r230587, head/share/man/man4/xnb.4)
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ stable/8/share/man/man4/xnb.4	Wed Feb 15 14:23:01 2012	(r231759, copy of r230587, head/share/man/man4/xnb.4)
@@ -0,0 +1,134 @@
+.\" Copyright (c) 2012 Spectra Logic Corporation
+.\"	All rights reserved.
+.\"
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions, and the following disclaimer,
+.\"    without modification.
+.\" 2. Redistributions in binary form must reproduce at minimum a disclaimer
+.\"    substantially similar to the "NO WARRANTY" disclaimer below
+.\"    ("Disclaimer") and any redistribution must be conditioned upon
+.\"    including a substantially similar Disclaimer requirement for further
+.\"    binary redistribution.
+.\" 
+.\" NO WARRANTY
+.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+.\" "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+.\" A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+.\" HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+.\" STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+.\" IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+.\" POSSIBILITY OF SUCH DAMAGES.
+.\" 
+.\" Authors: Alan Somers         (Spectra Logic Corporation)
+.\" 
+.\" $FreeBSD$
+.\"
+
+.Dd January 6, 2012
+.Dt XNB 4
+.Os 
+.Sh NAME
+.Nm xnb
+.Nd "Xen Paravirtualized Backend Ethernet Driver"
+.Sh SYNOPSIS
+To compile this driver into the kernel, place the following lines in your
+kernel configuration file:
+.Bd -ragged -offset indent
+.Cd "options XENHVM"
+.Cd "device xenpci"
+.Ed
+.Sh DESCRIPTION
+The
+.Nm
+driver provides the back half of a paravirtualized
+.Xr xen 4
+network connection.  The netback and netfront drivers appear to their
+respective operating systems as Ethernet devices linked by a crossover cable.
+Typically,
+.Nm
+will run on Domain 0 and the netfront driver will run on a guest domain.
+However, it is also possible to run
+.Nm
+on a guest domain.  It may be bridged or routed to provide the netfront's
+domain access to other guest domains or to a physical network.
+.Pp
+In most respects, the
+.Nm
+device appears to the OS as an other Ethernet device.  It can be configured at
+runtime entirely with
+.Xr ifconfig 8
+\&.  In particular, it supports MAC changing, arbitrary MTU sizes, checksum
+offload for IP, UDP, and TCP for both receive and transmit, and TSO.  However,
+see
+.Sx CAVEATS
+before enabling txcsum, rxcsum, or tso.
+.Sh SYSCTL VARIABLES
+The following read-only variables are available via
+.Xr sysctl 8 :
+.Bl -tag -width indent
+.It Va dev.xnb.%d.dump_rings
+Displays information about the ring buffers used to pass requests between the
+netfront and netback.  Mostly useful for debugging, but can also be used to
+get traffic statistics.
+.It Va dev.xnb.%d.unit_test_results
+Runs a builtin suite of unit tests and displays the results.  Does not affect
+the operation of the driver in any way.  Note that the test suite simulates
+error conditions; this will result in error messages being printed to the
+system system log.
+.Sh CAVEATS
+Packets sent through Xennet pass over shared memory, so the protocol includes
+no form of link-layer checksum or CRC.  Furthermore, Xennet drivers always
+report to their hosts that they support receive and transmit checksum
+offloading.  They "offload" the checksum calculation by simply skipping it.
+That works fine for packets that are exchanged between two domains on the same
+machine.  However, when a Xennet interface is bridged to a physical interface,
+a correct checksum must be attached to any packets bound for that physical
+interface.  Currently, FreeBSD lacks any mechanism for an ethernet device to
+inform the OS that newly received packets are valid even though their checksums
+are not.  So if the netfront driver is configured to offload checksum
+calculations, it will pass non-checksumed packets to
+.Nm
+, which must then calculate the checksum in software before passing the packet
+to the OS.
+.Pp
+For this reason, it is recommended that if
+.Nm
+is bridged to a physcal interface, then transmit checksum offloading should be
+disabled on the netfront.  The Xennet protocol does not have any mechanism for
+the netback to request the netfront to do this; the operator must do it
+manually.
+.Sh SEE ALSO
+.Xr arp 4 ,
+.Xr netintro 4 ,
+.Xr ng_ether 4 ,
+.Xr ifconfig 8 ,
+.Xr xen 4
+.Sh HISTORY
+The
+.Nm
+device driver first appeared in
+.Fx 10.0
+.
+.Sh AUTHORS
+The
+.Nm
+driver was written by
+.An Alan Somers
+.Aq alans at spectralogic.com
+and
+.An John Suykerbuyk
+.Aq johns at spectralogic.com
+.Sh BUGS
+The
+.Nm
+driver does not properly checksum UDP datagrams that span more than one
+Ethernet frame.  Nor does it correctly checksum IPv6 packets.  To workaround
+that bug, disable transmit checksum offloading on the netfront driver.

Modified: stable/8/sys/amd64/conf/XENHVM
==============================================================================
--- stable/8/sys/amd64/conf/XENHVM	Wed Feb 15 13:40:10 2012	(r231758)
+++ stable/8/sys/amd64/conf/XENHVM	Wed Feb 15 14:23:01 2012	(r231759)
@@ -17,6 +17,7 @@ makeoptions	MODULES_OVERRIDE=""
 #
 options 	NO_ADAPTIVE_MUTEXES
 options 	NO_ADAPTIVE_RWLOCKS
+options 	NO_ADAPTIVE_SX
 
 # Xen HVM support
 options 	XENHVM

Modified: stable/8/sys/conf/files
==============================================================================
--- stable/8/sys/conf/files	Wed Feb 15 13:40:10 2012	(r231758)
+++ stable/8/sys/conf/files	Wed Feb 15 14:23:01 2012	(r231759)
@@ -2377,6 +2377,7 @@ libkern/strlcpy.c		standard
 libkern/strlen.c		standard
 libkern/strncmp.c		standard
 libkern/strncpy.c		standard
+libkern/strnlen.c		standard
 libkern/strsep.c		standard
 libkern/strspn.c		standard
 libkern/strstr.c		standard
@@ -3040,6 +3041,7 @@ dev/xen/blkback/blkback.c	optional xen |
 dev/xen/console/console.c	optional xen
 dev/xen/console/xencons_ring.c	optional xen
 dev/xen/control/control.c	optional xen | xenhvm
+dev/xen/netback/netback.c	optional xen | xenhvm
 dev/xen/netfront/netfront.c	optional xen | xenhvm
 dev/xen/xenpci/xenpci.c		optional xenpci
 dev/xen/xenpci/evtchn.c         optional xenpci

Modified: stable/8/sys/dev/acpica/acpi.c
==============================================================================
--- stable/8/sys/dev/acpica/acpi.c	Wed Feb 15 13:40:10 2012	(r231758)
+++ stable/8/sys/dev/acpica/acpi.c	Wed Feb 15 14:23:01 2012	(r231759)
@@ -2538,6 +2538,8 @@ acpi_EnterSleepState(struct acpi_softc *
 	return_ACPI_STATUS (AE_OK);
     }
 
+    EVENTHANDLER_INVOKE(power_suspend);
+
     if (smp_started) {
 	thread_lock(curthread);
 	sched_bind(curthread, 0);
@@ -2629,6 +2631,8 @@ backout:
 	thread_unlock(curthread);
     }
 
+    EVENTHANDLER_INVOKE(power_resume);
+
     /* Allow another sleep request after a while. */
     timeout(acpi_sleep_enable, sc, hz * ACPI_MINIMUM_AWAKETIME);
 

Modified: stable/8/sys/dev/esp/ncr53c9x.c
==============================================================================
--- stable/8/sys/dev/esp/ncr53c9x.c	Wed Feb 15 13:40:10 2012	(r231758)
+++ stable/8/sys/dev/esp/ncr53c9x.c	Wed Feb 15 14:23:01 2012	(r231759)
@@ -316,7 +316,7 @@ ncr53c9x_attach(struct ncr53c9x_softc *s
 	 * The recommended timeout is 250ms.  This register is loaded
 	 * with a value calculated as follows, from the docs:
 	 *
-	 *		(timout period) x (CLK frequency)
+	 *		(timeout period) x (CLK frequency)
 	 *	reg = -------------------------------------
 	 *		 8192 x (Clock Conversion Factor)
 	 *

Modified: stable/8/sys/dev/twa/tw_osl.h
==============================================================================
--- stable/8/sys/dev/twa/tw_osl.h	Wed Feb 15 13:40:10 2012	(r231758)
+++ stable/8/sys/dev/twa/tw_osl.h	Wed Feb 15 14:23:01 2012	(r231759)
@@ -153,7 +153,7 @@ struct twa_softc {
 	struct mtx		sim_lock_handle;/* sim lock shared with cam */
 	struct mtx		*sim_lock;/* ptr to sim lock */
 
-	struct callout		watchdog_callout[2]; /* For command timout */
+	struct callout		watchdog_callout[2]; /* For command timeout */
 	TW_UINT32		watchdog_index;
 
 #ifdef TW_OSL_DEBUG

Modified: stable/8/sys/dev/xen/balloon/balloon.c
==============================================================================
--- stable/8/sys/dev/xen/balloon/balloon.c	Wed Feb 15 13:40:10 2012	(r231758)
+++ stable/8/sys/dev/xen/balloon/balloon.c	Wed Feb 15 14:23:01 2012	(r231759)
@@ -41,8 +41,8 @@ __FBSDID("$FreeBSD$");
 #include <sys/sysctl.h>
 
 #include <machine/xen/xen-os.h>
-#include <machine/xen/xenfunc.h>
 #include <machine/xen/xenvar.h>
+#include <machine/xen/xenfunc.h>
 #include <xen/hypervisor.h>
 #include <xen/xenstore/xenstorevar.h>
 
@@ -147,12 +147,6 @@ balloon_retrieve(void)
 	return page;
 }
 
-static void 
-balloon_alarm(void *unused)
-{
-	wakeup(balloon_process);
-}
-
 static unsigned long 
 current_target(void)
 {
@@ -378,6 +372,8 @@ balloon_process(void *unused)
 	
 	mtx_lock(&balloon_mutex);
 	for (;;) {
+		int sleep_time;
+
 		do {
 			credit = current_target() - bs.current_pages;
 			if (credit > 0)
@@ -389,9 +385,12 @@ balloon_process(void *unused)
 		
 		/* Schedule more work if there is some still to be done. */
 		if (current_target() != bs.current_pages)
-			timeout(balloon_alarm, NULL, ticks + hz);
+			sleep_time = hz;
+		else
+			sleep_time = 0;
 
-		msleep(balloon_process, &balloon_mutex, 0, "balloon", -1);
+		msleep(balloon_process, &balloon_mutex, 0, "balloon",
+		       sleep_time);
 	}
 	mtx_unlock(&balloon_mutex);
 }
@@ -474,9 +473,6 @@ balloon_init(void *arg)
 	bs.hard_limit    = ~0UL;
 
 	kproc_create(balloon_process, NULL, NULL, 0, 0, "balloon");
-//	init_timer(&balloon_timer);
-//	balloon_timer.data = 0;
-//	balloon_timer.function = balloon_alarm;
     
 #ifndef XENHVM
 	/* Initialise the balloon with excess memory space. */

Modified: stable/8/sys/dev/xen/blkback/blkback.c
==============================================================================
--- stable/8/sys/dev/xen/blkback/blkback.c	Wed Feb 15 13:40:10 2012	(r231758)
+++ stable/8/sys/dev/xen/blkback/blkback.c	Wed Feb 15 14:23:01 2012	(r231759)
@@ -1,5 +1,5 @@
 /*-
- * Copyright (c) 2009-2010 Spectra Logic Corporation
+ * Copyright (c) 2009-2011 Spectra Logic Corporation
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -61,6 +61,8 @@ __FBSDID("$FreeBSD$");
 #include <sys/types.h>
 #include <sys/vnode.h>
 #include <sys/mount.h>
+#include <sys/sysctl.h>
+#include <sys/bitstring.h>
 
 #include <geom/geom.h>
 
@@ -153,9 +155,19 @@ MALLOC_DEFINE(M_XENBLOCKBACK, "xbbd", "X
 #define	XBB_MAX_RING_PAGES						    \
 	BLKIF_RING_PAGES(BLKIF_SEGS_TO_BLOCKS(XBB_MAX_SEGMENTS_PER_REQUEST) \
 		       * XBB_MAX_REQUESTS)
+/**
+ * The maximum number of ring pages that we can allow per request list.
+ * We limit this to the maximum number of segments per request, because
+ * that is already a reasonable number of segments to aggregate.  This
+ * number should never be smaller than XBB_MAX_SEGMENTS_PER_REQUEST,
+ * because that would leave situations where we can't dispatch even one
+ * large request.
+ */
+#define	XBB_MAX_SEGMENTS_PER_REQLIST XBB_MAX_SEGMENTS_PER_REQUEST
 
 /*--------------------------- Forward Declarations ---------------------------*/
 struct xbb_softc;
+struct xbb_xen_req;
 
 static void xbb_attach_failed(struct xbb_softc *xbb, int err, const char *fmt,
 			      ...) __attribute__((format(printf, 3, 4)));
@@ -163,16 +175,15 @@ static int  xbb_shutdown(struct xbb_soft
 static int  xbb_detach(device_t dev);
 
 /*------------------------------ Data Structures -----------------------------*/
-/**
- * \brief Object tracking an in-flight I/O from a Xen VBD consumer.
- */
-struct xbb_xen_req {
-	/**
-	 * Linked list links used to aggregate idle request in the
-	 * request free pool (xbb->request_free_slist).
-	 */
-	SLIST_ENTRY(xbb_xen_req) links;
 
+STAILQ_HEAD(xbb_xen_req_list, xbb_xen_req);
+
+typedef enum {
+	XBB_REQLIST_NONE	= 0x00,
+	XBB_REQLIST_MAPPED	= 0x01
+} xbb_reqlist_flags;
+
+struct xbb_xen_reqlist {
 	/**
 	 * Back reference to the parent block back instance for this
 	 * request.  Used during bio_done handling.
@@ -180,17 +191,71 @@ struct xbb_xen_req {
 	struct xbb_softc        *xbb;
 
 	/**
-	 * The remote domain's identifier for this I/O request.
+	 * BLKIF_OP code for this request.
 	 */
-	uint64_t		 id;
+	int			 operation;
+
+	/**
+	 * Set to BLKIF_RSP_* to indicate request status.
+	 *
+	 * This field allows an error status to be recorded even if the
+	 * delivery of this status must be deferred.  Deferred reporting
+	 * is necessary, for example, when an error is detected during
+	 * completion processing of one bio when other bios for this
+	 * request are still outstanding.
+	 */
+	int			 status;
+
+	/**
+	 * Number of 512 byte sectors not transferred.
+	 */
+	int			 residual_512b_sectors;
+
+	/**
+	 * Starting sector number of the first request in the list.
+	 */
+	off_t			 starting_sector_number;
+
+	/**
+	 * If we're going to coalesce, the next contiguous sector would be
+	 * this one.
+	 */
+	off_t			 next_contig_sector;
+
+	/**
+	 * Number of child requests in the list.
+	 */
+	int			 num_children;
+
+	/**
+	 * Number of I/O requests dispatched to the backend.
+	 */
+	int			 pendcnt;
+
+	/**
+	 * Total number of segments for requests in the list.
+	 */
+	int			 nr_segments;
+
+	/**
+	 * Flags for this particular request list.
+	 */
+	xbb_reqlist_flags	 flags;
 
 	/**
 	 * Kernel virtual address space reserved for this request
-	 * structure and used to map the remote domain's pages for
+	 * list structure and used to map the remote domain's pages for
 	 * this I/O, into our domain's address space.
 	 */
 	uint8_t			*kva;
 
+	/**
+	 * Base, psuedo-physical address, corresponding to the start
+	 * of this request's kva region.
+	 */
+	uint64_t	 	 gnt_base;
+
+
 #ifdef XBB_USE_BOUNCE_BUFFERS
 	/**
 	 * Pre-allocated domain local memory used to proxy remote
@@ -200,53 +265,91 @@ struct xbb_xen_req {
 #endif
 
 	/**
-	 * Base, psuedo-physical address, corresponding to the start
-	 * of this request's kva region.
+	 * Array of grant handles (one per page) used to map this request.
 	 */
-	uint64_t	 	 gnt_base;
+	grant_handle_t		*gnt_handles;
+
+	/**
+	 * Device statistics request ordering type (ordered or simple).
+	 */
+	devstat_tag_type	 ds_tag_type;
+
+	/**
+	 * Device statistics request type (read, write, no_data).
+	 */
+	devstat_trans_flags	 ds_trans_type;
+
+	/**
+	 * The start time for this request.
+	 */
+	struct bintime		 ds_t0;
+
+	/**
+	 * Linked list of contiguous requests with the same operation type.
+	 */
+	struct xbb_xen_req_list	 contig_req_list;
+
+	/**
+	 * Linked list links used to aggregate idle requests in the
+	 * request list free pool (xbb->reqlist_free_stailq) and pending
+	 * requests waiting for execution (xbb->reqlist_pending_stailq).
+	 */
+	STAILQ_ENTRY(xbb_xen_reqlist) links;
+};
+
+STAILQ_HEAD(xbb_xen_reqlist_list, xbb_xen_reqlist);
+
+/**
+ * \brief Object tracking an in-flight I/O from a Xen VBD consumer.
+ */
+struct xbb_xen_req {
+	/**
+	 * Linked list links used to aggregate requests into a reqlist
+	 * and to store them in the request free pool.
+	 */
+	STAILQ_ENTRY(xbb_xen_req) links;
+
+	/**
+	 * The remote domain's identifier for this I/O request.
+	 */
+	uint64_t		  id;
 
 	/**
 	 * The number of pages currently mapped for this request.
 	 */
-	int			 nr_pages;
+	int			  nr_pages;
 
 	/**
 	 * The number of 512 byte sectors comprising this requests.
 	 */
-	int			 nr_512b_sectors;
+	int			  nr_512b_sectors;
 
 	/**
 	 * The number of struct bio requests still outstanding for this
 	 * request on the backend device.  This field is only used for	
 	 * device (rather than file) backed I/O.
 	 */
-	int			 pendcnt;
+	int			  pendcnt;
 
 	/**
 	 * BLKIF_OP code for this request.
 	 */
-	int			 operation;
+	int			  operation;
 
 	/**
-	 * BLKIF_RSP status code for this request.
-	 *
-	 * This field allows an error status to be recorded even if the
-	 * delivery of this status must be deferred.  Deferred reporting
-	 * is necessary, for example, when an error is detected during
-	 * completion processing of one bio when other bios for this
-	 * request are still outstanding.
+	 * Storage used for non-native ring requests.
 	 */
-	int			 status;
+	blkif_request_t		 ring_req_storage;
 
 	/**
-	 * Device statistics request ordering type (ordered or simple).
+	 * Pointer to the Xen request in the ring.
 	 */
-	devstat_tag_type	 ds_tag_type;
+	blkif_request_t		*ring_req;
 
 	/**
-	 * Device statistics request type (read, write, no_data).
+	 * Consumer index for this request.
 	 */
-	devstat_trans_flags	 ds_trans_type;
+	RING_IDX		 req_ring_idx;
 
 	/**
 	 * The start time for this request.
@@ -254,9 +357,9 @@ struct xbb_xen_req {
 	struct bintime		 ds_t0;
 
 	/**
-	 * Array of grant handles (one per page) used to map this request.
+	 * Pointer back to our parent request list.
 	 */
-	grant_handle_t		*gnt_handles;
+	struct xbb_xen_reqlist  *reqlist;
 };
 SLIST_HEAD(xbb_xen_req_slist, xbb_xen_req);
 
@@ -321,7 +424,10 @@ typedef enum
 	XBBF_RESOURCE_SHORTAGE = 0x04,
 
 	/** Connection teardown in progress. */
-	XBBF_SHUTDOWN          = 0x08
+	XBBF_SHUTDOWN          = 0x08,
+
+	/** A thread is already performing shutdown processing. */
+	XBBF_IN_SHUTDOWN       = 0x10
 } xbb_flag_t;
 
 /** Backend device type.  */
@@ -399,7 +505,7 @@ struct xbb_file_data {
 	 * Only a single file based request is outstanding per-xbb instance,
 	 * so we only need one of these.
 	 */
-	struct iovec	xiovecs[XBB_MAX_SEGMENTS_PER_REQUEST];
+	struct iovec	xiovecs[XBB_MAX_SEGMENTS_PER_REQLIST];
 #ifdef XBB_USE_BOUNCE_BUFFERS
 
 	/**
@@ -411,7 +517,7 @@ struct xbb_file_data {
 	 * bounce-out the read data.  This array serves as the temporary
 	 * storage for this saved data.
 	 */
-	struct iovec	saved_xiovecs[XBB_MAX_SEGMENTS_PER_REQUEST];
+	struct iovec	saved_xiovecs[XBB_MAX_SEGMENTS_PER_REQLIST];
 
 	/**
 	 * \brief Array of memoized bounce buffer kva offsets used
@@ -422,7 +528,7 @@ struct xbb_file_data {
 	 * the request sg elements is unavoidable. We memoize the computed
 	 * bounce address here to reduce the cost of the second walk.
 	 */
-	void		*xiovecs_vaddr[XBB_MAX_SEGMENTS_PER_REQUEST];
+	void		*xiovecs_vaddr[XBB_MAX_SEGMENTS_PER_REQLIST];
 #endif /* XBB_USE_BOUNCE_BUFFERS */
 };
 
@@ -437,9 +543,9 @@ union xbb_backend_data {
 /**
  * Function signature of backend specific I/O handlers.
  */
-typedef int (*xbb_dispatch_t)(struct xbb_softc *xbb, blkif_request_t *ring_req,
-			      struct xbb_xen_req *req, int nseg,
-			      int operation, int flags);
+typedef int (*xbb_dispatch_t)(struct xbb_softc *xbb,
+			      struct xbb_xen_reqlist *reqlist, int operation,
+			      int flags);
 
 /**
  * Per-instance configuration data.
@@ -467,14 +573,23 @@ struct xbb_softc {
 	xbb_dispatch_t		  dispatch_io;
 
 	/** The number of requests outstanding on the backend device/file. */
-	u_int			  active_request_count;
+	int			  active_request_count;
 
 	/** Free pool of request tracking structures. */
-	struct xbb_xen_req_slist  request_free_slist;
+	struct xbb_xen_req_list   request_free_stailq;
 
 	/** Array, sized at connection time, of request tracking structures. */
 	struct xbb_xen_req	 *requests;
 
+	/** Free pool of request list structures. */
+	struct xbb_xen_reqlist_list reqlist_free_stailq;
+
+	/** List of pending request lists awaiting execution. */
+	struct xbb_xen_reqlist_list reqlist_pending_stailq;
+
+	/** Array, sized at connection time, of request list structures. */
+	struct xbb_xen_reqlist	 *request_lists;
+
 	/**
 	 * Global pool of kva used for mapping remote domain ring
 	 * and I/O transaction data.
@@ -487,6 +602,15 @@ struct xbb_softc {
 	/** The size of the global kva pool. */
 	int			  kva_size;
 
+	/** The size of the KVA area used for request lists. */
+	int			  reqlist_kva_size;
+
+	/** The number of pages of KVA used for request lists */
+	int			  reqlist_kva_pages;
+
+	/** Bitmap of free KVA pages */
+	bitstr_t		 *kva_free;
+
 	/**
 	 * \brief Cached value of the front-end's domain id.
 	 * 
@@ -508,12 +632,12 @@ struct xbb_softc {
 	int			  abi;
 
 	/**
-	 * \brief The maximum number of requests allowed to be in
-	 *        flight at a time.
+	 * \brief The maximum number of requests and request lists allowed
+	 *        to be in flight at a time.
 	 *
 	 * This value is negotiated via the XenStore.
 	 */
-	uint32_t		  max_requests;
+	u_int			  max_requests;
 
 	/**
 	 * \brief The maximum number of segments (1 page per segment)
@@ -521,7 +645,15 @@ struct xbb_softc {
 	 *
 	 * This value is negotiated via the XenStore.
 	 */
-	uint32_t		  max_request_segments;
+	u_int			  max_request_segments;
+
+	/**
+	 * \brief Maximum number of segments per request list.
+	 *
+	 * This value is derived from and will generally be larger than
+	 * max_request_segments.
+	 */
+	u_int			  max_reqlist_segments;
 
 	/**
 	 * The maximum size of any request to this back-end
@@ -529,7 +661,13 @@ struct xbb_softc {
 	 *
 	 * This value is negotiated via the XenStore.
 	 */
-	uint32_t		  max_request_size;
+	u_int			  max_request_size;
+
+	/**
+	 * The maximum size of any request list.  This is derived directly
+	 * from max_reqlist_segments.
+	 */
+	u_int			  max_reqlist_size;
 
 	/** Various configuration and state bit flags. */
 	xbb_flag_t		  flags;
@@ -574,6 +712,7 @@ struct xbb_softc {
 	struct vnode		 *vn;
 
 	union xbb_backend_data	  backend;
+
 	/** The native sector size of the backend. */
 	u_int			  sector_size;
 
@@ -598,7 +737,14 @@ struct xbb_softc {
 	 *
 	 * Ring processing is serialized so we only need one of these.
 	 */
-	struct xbb_sg		  xbb_sgs[XBB_MAX_SEGMENTS_PER_REQUEST];
+	struct xbb_sg		  xbb_sgs[XBB_MAX_SEGMENTS_PER_REQLIST];
+
+	/**
+	 * Temporary grant table map used in xbb_dispatch_io().  When
+	 * XBB_MAX_SEGMENTS_PER_REQLIST gets large, keeping this on the
+	 * stack could cause a stack overflow.
+	 */
+	struct gnttab_map_grant_ref   maps[XBB_MAX_SEGMENTS_PER_REQLIST];
 
 	/** Mutex protecting per-instance data. */
 	struct mtx		  lock;
@@ -614,8 +760,51 @@ struct xbb_softc {
 	int			  pseudo_phys_res_id;
 #endif
 
-	/** I/O statistics. */
+	/**
+	 * I/O statistics from BlockBack dispatch down.  These are
+	 * coalesced requests, and we start them right before execution.
+	 */
 	struct devstat		 *xbb_stats;
+
+	/**
+	 * I/O statistics coming into BlockBack.  These are the requests as
+	 * we get them from BlockFront.  They are started as soon as we
+	 * receive a request, and completed when the I/O is complete.
+	 */
+	struct devstat		 *xbb_stats_in;
+
+	/** Disable sending flush to the backend */
+	int			  disable_flush;
+
+	/** Send a real flush for every N flush requests */
+	int			  flush_interval;
+
+	/** Count of flush requests in the interval */
+	int			  flush_count;
+
+	/** Don't coalesce requests if this is set */
+	int			  no_coalesce_reqs;
+
+	/** Number of requests we have received */
+	uint64_t		  reqs_received;
+
+	/** Number of requests we have completed*/
+	uint64_t		  reqs_completed;
+
+	/** How many forced dispatches (i.e. without coalescing) have happend */
+	uint64_t		  forced_dispatch;
+
+	/** How many normal dispatches have happend */
+	uint64_t		  normal_dispatch;
+
+	/** How many total dispatches have happend */
+	uint64_t		  total_dispatch;
+
+	/** How many times we have run out of KVA */
+	uint64_t		  kva_shortages;
+
+	/** How many times we have run out of request structures */
+	uint64_t		  request_shortages;
 };
 
 /*---------------------------- Request Processing ----------------------------*/
@@ -633,21 +822,14 @@ xbb_get_req(struct xbb_softc *xbb)
 	struct xbb_xen_req *req;
 
 	req = NULL;
-	mtx_lock(&xbb->lock);
 
-	/*
-	 * Do not allow new requests to be allocated while we
-	 * are shutting down.
-	 */
-	if ((xbb->flags & XBBF_SHUTDOWN) == 0) {
-		if ((req = SLIST_FIRST(&xbb->request_free_slist)) != NULL) {
-			SLIST_REMOVE_HEAD(&xbb->request_free_slist, links);
-			xbb->active_request_count++;
-		} else {
-			xbb->flags |= XBBF_RESOURCE_SHORTAGE;
-		}
+	mtx_assert(&xbb->lock, MA_OWNED);
+
+	if ((req = STAILQ_FIRST(&xbb->request_free_stailq)) != NULL) {
+		STAILQ_REMOVE_HEAD(&xbb->request_free_stailq, links);
+		xbb->active_request_count++;
 	}
-	mtx_unlock(&xbb->lock);
+
 	return (req);
 }
 
@@ -660,34 +842,40 @@ xbb_get_req(struct xbb_softc *xbb)
 static inline void
 xbb_release_req(struct xbb_softc *xbb, struct xbb_xen_req *req)
 {
-	int wake_thread;
+	mtx_assert(&xbb->lock, MA_OWNED);
 
-	mtx_lock(&xbb->lock);
-	wake_thread = xbb->flags & XBBF_RESOURCE_SHORTAGE;
-	xbb->flags &= ~XBBF_RESOURCE_SHORTAGE;
-	SLIST_INSERT_HEAD(&xbb->request_free_slist, req, links);
+	STAILQ_INSERT_HEAD(&xbb->request_free_stailq, req, links);
 	xbb->active_request_count--;
 
-	if ((xbb->flags & XBBF_SHUTDOWN) != 0) {
-		/*
-		 * Shutdown is in progress.  See if we can
-		 * progress further now that one more request
-		 * has completed and been returned to the
-		 * free pool.
-		 */
-		xbb_shutdown(xbb);
-	}
-	mtx_unlock(&xbb->lock);
+	KASSERT(xbb->active_request_count >= 0,
+		("xbb_release_req: negative active count"));
+}
 
-	if (wake_thread != 0)
-		taskqueue_enqueue(xbb->io_taskqueue, &xbb->io_task); 
+/**
+ * Return an xbb_xen_req_list of allocated xbb_xen_reqs to the free pool.
+ *
+ * \param xbb	    Per-instance xbb configuration structure.
+ * \param req_list  The list of requests to free.
+ * \param nreqs	    The number of items in the list.
+ */
+static inline void
+xbb_release_reqs(struct xbb_softc *xbb, struct xbb_xen_req_list *req_list,
+		 int nreqs)
+{
+	mtx_assert(&xbb->lock, MA_OWNED);
+
+	STAILQ_CONCAT(&xbb->request_free_stailq, req_list);
+	xbb->active_request_count -= nreqs;
+
+	KASSERT(xbb->active_request_count >= 0,
+		("xbb_release_reqs: negative active count"));
 }
 
 /**
  * Given a page index and 512b sector offset within that page,
  * calculate an offset into a request's kva region.
  *
- * \param req     The request structure whose kva region will be accessed.
+ * \param reqlist The request structure whose kva region will be accessed.
  * \param pagenr  The page index used to compute the kva offset.
  * \param sector  The 512b sector index used to compute the page relative
  *                kva offset.
@@ -695,9 +883,9 @@ xbb_release_req(struct xbb_softc *xbb, s
  * \return  The computed global KVA offset.
  */
 static inline uint8_t *
-xbb_req_vaddr(struct xbb_xen_req *req, int pagenr, int sector)
+xbb_reqlist_vaddr(struct xbb_xen_reqlist *reqlist, int pagenr, int sector)
 {
-	return (req->kva + (PAGE_SIZE * pagenr) + (sector << 9));
+	return (reqlist->kva + (PAGE_SIZE * pagenr) + (sector << 9));
 }
 
 #ifdef XBB_USE_BOUNCE_BUFFERS
@@ -705,7 +893,7 @@ xbb_req_vaddr(struct xbb_xen_req *req, i
  * Given a page index and 512b sector offset within that page,
  * calculate an offset into a request's local bounce memory region.
  *
- * \param req     The request structure whose bounce region will be accessed.
+ * \param reqlist The request structure whose bounce region will be accessed.
  * \param pagenr  The page index used to compute the bounce offset.
  * \param sector  The 512b sector index used to compute the page relative
  *                bounce offset.
@@ -713,9 +901,9 @@ xbb_req_vaddr(struct xbb_xen_req *req, i
  * \return  The computed global bounce buffer address.
  */
 static inline uint8_t *
-xbb_req_bounce_addr(struct xbb_xen_req *req, int pagenr, int sector)
+xbb_reqlist_bounce_addr(struct xbb_xen_reqlist *reqlist, int pagenr, int sector)
 {
-	return (req->bounce + (PAGE_SIZE * pagenr) + (sector << 9));
+	return (reqlist->bounce + (PAGE_SIZE * pagenr) + (sector << 9));
 }
 #endif
 
@@ -724,7 +912,7 @@ xbb_req_bounce_addr(struct xbb_xen_req *
  * calculate an offset into the request's memory region that the
  * underlying backend device/file should use for I/O.
  *
- * \param req     The request structure whose I/O region will be accessed.
+ * \param reqlist The request structure whose I/O region will be accessed.
  * \param pagenr  The page index used to compute the I/O offset.
  * \param sector  The 512b sector index used to compute the page relative
  *                I/O offset.
@@ -736,12 +924,12 @@ xbb_req_bounce_addr(struct xbb_xen_req *
  * this request.
  */
 static inline uint8_t *
-xbb_req_ioaddr(struct xbb_xen_req *req, int pagenr, int sector)
+xbb_reqlist_ioaddr(struct xbb_xen_reqlist *reqlist, int pagenr, int sector)
 {
 #ifdef XBB_USE_BOUNCE_BUFFERS
-	return (xbb_req_bounce_addr(req, pagenr, sector));
+	return (xbb_reqlist_bounce_addr(reqlist, pagenr, sector));
 #else
-	return (xbb_req_vaddr(req, pagenr, sector));
+	return (xbb_reqlist_vaddr(reqlist, pagenr, sector));
 #endif
 }
 
@@ -750,7 +938,7 @@ xbb_req_ioaddr(struct xbb_xen_req *req, 
  * an offset into the local psuedo-physical address space used to map a
  * front-end's request data into a request.
  *
- * \param req     The request structure whose pseudo-physical region
+ * \param reqlist The request list structure whose pseudo-physical region
  *                will be accessed.
  * \param pagenr  The page index used to compute the pseudo-physical offset.
  * \param sector  The 512b sector index used to compute the page relative
@@ -763,10 +951,126 @@ xbb_req_ioaddr(struct xbb_xen_req *req, 
  * this request.
  */
 static inline uintptr_t
-xbb_req_gntaddr(struct xbb_xen_req *req, int pagenr, int sector)
+xbb_get_gntaddr(struct xbb_xen_reqlist *reqlist, int pagenr, int sector)
 {
-	return ((uintptr_t)(req->gnt_base
-			  + (PAGE_SIZE * pagenr) + (sector << 9)));
+	struct xbb_softc *xbb;
+
+	xbb = reqlist->xbb;
+
+	return ((uintptr_t)(xbb->gnt_base_addr +
+		(uintptr_t)(reqlist->kva - xbb->kva) +
+		(PAGE_SIZE * pagenr) + (sector << 9)));
+}
+
+/**
+ * Get Kernel Virtual Address space for mapping requests.
+ *
+ * \param xbb         Per-instance xbb configuration structure.
+ * \param nr_pages    Number of pages needed.
+ * \param check_only  If set, check for free KVA but don't allocate it.
+ * \param have_lock   If set, xbb lock is already held.
+ *
+ * \return  On success, a pointer to the allocated KVA region.  Otherwise NULL.
+ *
+ * Note:  This should be unnecessary once we have either chaining or
+ * scatter/gather support for struct bio.  At that point we'll be able to
+ * put multiple addresses and lengths in one bio/bio chain and won't need
+ * to map everything into one virtual segment.
+ */
+static uint8_t *
+xbb_get_kva(struct xbb_softc *xbb, int nr_pages)
+{
+	intptr_t first_clear, num_clear;
+	uint8_t *free_kva;
+	int i;
+
+	KASSERT(nr_pages != 0, ("xbb_get_kva of zero length"));
+
+	first_clear = 0;
+	free_kva = NULL;
+
+	mtx_lock(&xbb->lock);
+
+	/*
+	 * Look for the first available page.  If there are none, we're done.
+	 */
+	bit_ffc(xbb->kva_free, xbb->reqlist_kva_pages, &first_clear);
+
+	if (first_clear == -1)
+		goto bailout;
+
+	/*
+	 * Starting at the first available page, look for consecutive free
+	 * pages that will satisfy the user's request.
+	 */
+	for (i = first_clear, num_clear = 0; i < xbb->reqlist_kva_pages; i++) {
+		/*
+		 * If this is true, the page is used, so we have to reset
+		 * the number of clear pages and the first clear page
+		 * (since it pointed to a region with an insufficient number
+		 * of clear pages).
+		 */
+		if (bit_test(xbb->kva_free, i)) {
+			num_clear = 0;
+			first_clear = -1;
+			continue;
+		}
+
+		if (first_clear == -1)
+			first_clear = i;
+
+		/*
+		 * If this is true, we've found a large enough free region
+		 * to satisfy the request.
+		 */
+		if (++num_clear == nr_pages) {
+
+			bit_nset(xbb->kva_free, first_clear,

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***


More information about the svn-src-stable mailing list