svn commit: r215788 - in stable/8/sys: conf dev/xen/balloon dev/xen/blkback dev/xen/blkfront dev/xen/control dev/xen/netfront dev/xen/xenpci i386/xen xen xen/evtchn xen/interface xen/interface/hvm ...

Justin T. Gibbs gibbs at FreeBSD.org
Wed Nov 24 01:03:04 UTC 2010


Author: gibbs
Date: Wed Nov 24 01:03:03 2010
New Revision: 215788
URL: http://svn.freebsd.org/changeset/base/215788

Log:
  Synchronize Xen support with current, excluding console API changes in
  the PV Xen console driver.
  
  Merged revisions
  ================
  r199734 | kmacy | 2009-11-24 00:17:51 -0700 (Tue, 24 Nov 2009) | 2 lines
  
      fixup kernel core dumps on paravirtual guests
  
  r199959 | kmacy | 2009-11-29 21:20:43 -0700 (Sun, 29 Nov 2009) | 3 lines
  
      Update license to reflect terms in xen 2.0 as of the time when the
      driver was ported to FreeBSD
  
  r199960 | kmacy | 2009-11-29 21:32:34 -0700 (Sun, 29 Nov 2009) | 2 lines
  
      Merge Scott Long's latest blkfront now that the licensing issues are
      resolved
  
  r201234 | gibbs | 2009-12-29 16:28:13 -0700 (Tue, 29 Dec 2009) | 5 lines
  
      Correct bug introduced while purging the -ERRNO Linuxism from the
      grant table API.  Valid grant refs are in the range of positive 32bit
      integers.  ENOSPACE, being 29, is also a positive integer.  Return
      GNTTAB_LIST_END (-1) instead when gnttab_claim_grant_reference() fails.
  
  r201138 | gibbs | 2009-12-28 11:59:13 -0700 (Mon, 28 Dec 2009) | 8 lines
  
      Correct alignment and boundary constraints in blkfront's bus dma tag.  The
      blkif interface in Xen requires all I/O to be 512 byte aligned with each
      segment bounded by a 4k page.
  
      Note: This submission only documents the proper contraints for blkif I/O.
            The alignment code in busdma does not yet handle alignment constraints
            correctly in all cases.
  
  r201236 | gibbs | 2009-12-29 16:31:21 -0700 (Tue, 29 Dec 2009) | 3 lines
  
      In blkif_queue_cb(), test the return value from
      gnttab_claim_grant_reference() for >= 0 instead of != ENOSPC.
  
  r204159 | kmacy | 2010-02-20 18:12:18 -0700 (Sat, 20 Feb 2010) | 2 lines
  
      don't hold spin lock across free
  
  r214077 | gibbs | 2010-10-19 14:53:30 -0600 (Tue, 19 Oct 2010) | 342 lines
  
      Improve the Xen para-virtualized device infrastructure of FreeBSD:
  
       o Add support for backend devices (e.g. blkback)
       o Implement extensions to the Xen para-virtualized block API to allow
         for larger and more outstanding I/Os.
       o Import a completely rewritten block back driver with support for
         fronting I/O to both raw devices and files.
       o General cleanup and documentation of the XenBus and XenStore support
         code.
       o Robustness and performance updates for the block front driver.
       o Fixes to the netfront driver.
  
      Sponsored by: Spectra Logic Corporation
  
      sys/xen/xenbus/init.txt:
              Deleted: This file explains the Linux method for XenBus device
              enumeration and thus does not apply to FreeBSD's NewBus approach.
  
      sys/xen/xenbus/xenbus_probe_backend.c:
             Deleted: Linux version of backend XenBus service routines.  It
             was never ported to FreeBSD.  See xenbusb.c, xenbusb_if.m,
             xenbusb_front.c xenbusb_back.c for details of FreeBSD's XenBus
             support.
  
      sys/xen/xenbus/xenbusvar.h:
      sys/xen/xenbus/xenbus_xs.c:
      sys/xen/xenbus/xenbus_comms.c:
      sys/xen/xenbus/xenbus_comms.h:
      sys/xen/xenstore/xenstorevar.h:
      sys/xen/xenstore/xenstore.c:
              Split XenStore into its own tree.  XenBus is a software layer
              built on top of XenStore.  The old arrangement and the naming of
              some structures and functions blurred these lines making it
              difficult to discern what services are provided by which layer
              and at what times these services are available (e.g. during
              system startup and shutdown).
  
      sys/xen/xenbus/xenbus_client.c:
      sys/xen/xenbus/xenbus.c:
      sys/xen/xenbus/xenbus_probe.c:
      sys/xen/xenbus/xenbusb.c:
      sys/xen/xenbus/xenbusb.h:
              Split up XenBus code into methods available for use by client
              drivers (xenbus.c) and code used by the XenBus "bus code" to
              enumerate, attach, detach, and service bus drivers.
  
      sys/xen/reboot.c:
      sys/dev/xen/control/control.c:
  	    Add a XenBus front driver for handling shutdown, reboot,
  	    suspend, and resume events published in the XenStore.
  	    Move all PV suspend/reboot support from reboot.c into
  	    this driver.
  
      sys/xen/blkif.h:
              New file from Xen vendor with macros and structures used by
              a block back driver to service requests from a VM running a
              different ABI (e.g. amd64 back with i386 front).
  
      sys/conf/files:
              Adjust kernel build spec for new XenBus/XenStore layout and added
              Xen functionality.
  
      sys/dev/xen/balloon/balloon.c:
      sys/dev/xen/netfront/netfront.c:
      sys/dev/xen/blkfront/blkfront.c:
      sys/xen/xenbus/...
      sys/xen/xenstore/...
              o Rename XenStore APIs and structures from xenbus_* to xs_*.
  	    o Adjust to use of M_XENBUS and M_XENSTORE malloc types
  	      for allocation of objects returned by these APIs.
  	    o Adjust for changes in the bus interface for Xen
  	    drivers.
  
      sys/xen/xenbus/...
      sys/xen/xenstore/...
              Add Doxygen comments for these interfaces and the code that
              implements them.
  
      sys/dev/xen/blkback/blkback.c:
              o Rewrite the Block Back driver to attach properly via newbus,
                operate correctly in both PV and HVM mode regardless of domain
                (e.g. can be in a DOM other than 0), and to deal with the latest
                metadata available in XenStore for block devices.
  
              o Allow users to specify a file as a backend to blkback, in addition
                to character devices.  Use the namei lookup of the backend path
                to automatically configure, based on file type, the appropriate
                backend method.
  
              The current implementation is limited to a single outstanding I/O
              at a time to file backed storage.
  
      sys/dev/xen/blkback/blkback.c:
      sys/xen/interface/io/blkif.h:
      sys/xen/blkif.h:
      sys/dev/xen/blkfront/blkfront.c:
      sys/dev/xen/blkfront/block.h:
              Extend the Xen blkif API: Negotiable request size and number of
              requests.
  
              This change extends the information recorded in the XenStore
              allowing block front/back devices to negotiate for optimal I/O
              parameters.  This has been achieved without sacrificing backward
              compatibility with drivers that are unaware of these protocol
              enhancements.  The extensions center around the connection protocol
              which now includes these additions:
  
              o The back-end device publishes its maximum supported values for,
                request I/O size, the number of page segments that can be
                associated with a request, the maximum number of requests that
                can be concurrently active, and the maximum number of pages that
                can be in the shared request ring.  These values are published
                before the back-end enters the XenbusStateInitWait state.
  
              o The front-end waits for the back-end to enter either the InitWait
                or Initialize state.  At this point, the front end limits it's
                own capabilities to the lesser of the values it finds published
                by the backend, it's own maximums, or, should any back-end data
                be missing in the store, the values supported by the original
                protocol.  It then initializes it's internal data structures
                including allocation of the shared ring, publishes its maximum
                capabilities to the XenStore and transitions to the Initialized
                state.
  
              o The back-end waits for the front-end to enter the Initalized
                state.  At this point, the back end limits it's own capabilities
                to the lesser of the values it finds published by the frontend,
                it's own maximums, or, should any front-end data be missing in
                the store, the values supported by the original protocol.  It
                then initializes it's internal data structures, attaches to the
                shared ring and transitions to the Connected state.
  
              o The front-end waits for the back-end to enter the Connnected
                state, transitions itself to the connected state, and can
                commence I/O.
  
              Although an updated front-end driver must be aware of the back-end's
              InitWait state, the back-end has been coded such that it can
              tolerate a front-end that skips this step and transitions directly
              to the Initialized state without waiting for the back-end.
  
      sys/xen/interface/io/blkif.h:
              o Increase BLKIF_MAX_SEGMENTS_PER_REQUEST to 255.  This is
                the maximum number possible without changing the blkif
                request header structure (nr_segs is a uint8_t).
  
              o Add two new constants:
                BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK, and
                BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK.  These respectively
                indicate the number of segments that can fit in the first
                ring-buffer entry of a request, and for each subsequent
                (sg element only) ring-buffer entry associated with the
                "header" ring-buffer entry of the request.
  
              o Add the blkif_request_segment_t typedef for segment
                elements.
  
              o Add the BLKRING_GET_SG_REQUEST() macro which wraps the
                RING_GET_REQUEST() macro and returns a properly cast
                pointer to an array of blkif_request_segment_ts.
  
              o Add the BLKIF_SEGS_TO_BLOCKS() macro which calculates the
                number of ring entries that will be consumed by a blkif
                request with the given number of segments.
  
      sys/xen/blkif.h:
              o Update for changes in interface/io/blkif.h macros.
  
              o Update the BLKIF_MAX_RING_REQUESTS() macro to take the
                ring size as an argument to allow this calculation on
                multi-page rings.
  
              o Add a companion macro to BLKIF_MAX_RING_REQUESTS(),
                BLKIF_RING_PAGES().  This macro determines the number of
                ring pages required in order to support a ring with the
                supplied number of request blocks.
  
      sys/dev/xen/blkback/blkback.c:
      sys/dev/xen/blkfront/blkfront.c:
      sys/dev/xen/blkfront/block.h:
              o Negotiate with the other-end with the following limits:
                    Reqeust Size:   MAXPHYS
                    Max Segments:   (MAXPHYS/PAGE_SIZE) + 1
                    Max Requests:   256
                    Max Ring Pages: Sufficient to support Max Requests with
                                    Max Segments.
  
              o Dynamically allocate request pools and segemnts-per-request.
  
              o Update ring allocation/attachment code to support a
                multi-page shared ring.
  
              o Update routines that access the shared ring to handle
                multi-block requests.
  
      sys/dev/xen/blkfront/blkfront.c:
              o Track blkfront allocations in a blkfront driver specific
                malloc pool.
  
              o Strip out XenStore transaction retry logic in the
                connection code.  Transactions only need to be used when
                the update to multiple XenStore nodes must be atomic.
                That is not the case here.
  
              o Fully disable blkif_resume() until it can be fixed
                properly (it didn't work before this change).
  
              o Destroy bus-dma objects during device instance tear-down.
  
              o Properly handle backend devices with powef-of-2 sector
                sizes larger than 512b.
  
      sys/dev/xen/blkback/blkback.c:
              Advertise support for and implement the BLKIF_OP_WRITE_BARRIER
              and BLKIF_OP_FLUSH_DISKCACHE blkif opcodes using BIO_FLUSH and
              the BIO_ORDERED attribute of bios.
  
      sys/dev/xen/blkfront/blkfront.c:
      sys/dev/xen/blkfront/block.h:
              Fix various bugs in blkfront.
  
             o gnttab_alloc_grant_references() returns 0 for success and
               non-zero for failure.  The check for < 0 is a leftover
               Linuxism.
  
             o When we negotiate with blkback and have to reduce some of our
               capabilities, print out the original and reduced capability before
               changing the local capability.  So the user now gets the correct
               information.
  
              o Fix blkif_restart_queue_callback() formatting.  Make sure we hold
                the mutex in that function before calling xb_startio().
  
              o Fix a couple of KASSERT()s.
  
              o Fix a check in the xb_remove_* macro to be a little more specific.
  
      sys/xen/gnttab.h:
      sys/xen/gnttab.c:
              Define GNTTAB_LIST_END publicly as GRANT_REF_INVALID.
  
      sys/dev/xen/netfront/netfront.c:
              Use GRANT_REF_INVALID instead of driver private definitions of the
              same constant.
  
      sys/xen/gnttab.h:
      sys/xen/gnttab.c:
              Add the gnttab_end_foreign_access_references() API.
  
  	    This API allows a client to batch the release of an
  	    array of grant references, instead of coding a private
  	    for loop.  The implementation takes advantage of this
  	    batching to reduce lock overhead to one acquisition and
  	    release per-batch instead of per-freed grant reference.
  
  	    While here, reduce the duration the gnttab_list_lock
  	    is held during gnttab_free_grant_references() operations.
  	    The search to find the tail of the incoming free list
  	    does not rely on global state and so can be performed
  	    without holding the lock.
  
      sys/dev/xen/xenpci/evtchn.c:
      sys/dev/xen/evtchn/evtchn.c:
      sys/xen/xen_intr.h:
  	    o Implement the bind_interdomain_evtchn_to_irqhandler
  	      API for HVM mode.  This allows an HVM domain to serve
  	      back end devices to other domains.  This API is already
  	      implemented for PV mode.
  
              o Synchronize the API between HVM and PV.
  
      sys/dev/xen/xenpci/xenpci.c:
  	    o Scan the full region of CPUID space in which the Xen
  	      VMM interface may be implemented.  On systems using
  	      SuSE as a Dom0 where the Viridian API is also exported,
  	      the VMM interface is above the region we used to
  	      search.
  
              o Pass through bus_alloc_resource() calls so that XenBus drivers
                attaching on an HVM system can allocate unused physical address
                space from the nexus.  The block back driver makes use of this
                facility.
  
      sys/i386/xen/xen_machdep.c:
              Use the correct type for accessing the statically mapped xenstore
              metadata.
  
      sys/xen/interface/hvm/params.h:
      sys/xen/xenstore/xenstore.c:
              Move hvm_get_parameter() to the correct global header file instead
              of as a private method to the XenStore.
  
      sys/xen/interface/io/protocols.h:
              Sync with vendor.
  
      sys/xeninterface/io/ring.h:
              Add macro for calculating the number of ring pages needed for an N
              deep ring.
  
              To avoid duplication within the macros, create and use the new
              __RING_HEADER_SIZE() macro.  This macro calculates the size of the
              ring book keeping struct (producer/consumer indexes, etc.) that
              resides at the head of the ring.
  
              Add the __RING_PAGES() macro which calculates the number of shared
              ring pages required to support a ring with the given number of
              requests.
  
              These APIs are used to support the multi-page ring version of the
              Xen block API.
  
      sys/xeninterface/io/xenbus.h:
              Add Comments.
  
      sys/xen/xenbus/...
  	    o Refactor the FreeBSD XenBus support code to allow for
  	      both front and backend device attachments.
  
              o Make use of new config_intr_hook capabilities to allow
                front and back devices to be probed/attached in parallel.
  
  	    o Fix bugs in probe/attach state machine that could
  	      cause the system to hang when confronted with a failure
  	      either in the local domain or in a remote domain to
  	      which one of our driver instances is attaching.
  
  	    o Publish all required state to the XenStore on device
  	      detach and failure.  The majority of the missing
  	      functionality was for serving as a back end since the
  	      typical "hot-plug" scripts in Dom0 don't handle the
  	      case of cleaning up for a "service domain" that is
  	      not itself.
  
              o Add dynamic sysctl nodes exposing the generic ivars of
                XenBus devices.
  
              o Add doxygen style comments to the majority of the code.
  
              o Cleanup types, formatting, etc.
  
      sys/xen/xenbus/xenbusb.c:
              Common code used by both front and back XenBus busses.
  
      sys/xen/xenbus/xenbusb_if.m:
              Method definitions for a XenBus bus.
  
      sys/xen/xenbus/xenbusb_front.c:
      sys/xen/xenbus/xenbusb_back.c:
              XenBus bus specialization for front and back devices.
  
  r214444 | gibbs | 2010-10-27 22:14:28 -0600 (Wed, 27 Oct 2010) | 9 lines
  
      sys/dev/xen/blkback/blkback.c:
              In xbb_detach() only perform cleanup of our taskqueue and
              device statistics structures if they have been initialized.
              This avoids a panic when xbb_detach() is called on a partially
              initialized device instance, due to an early failure in
              attach.
  
      Sponsored by:   Spectra Logic Corporation
  
  r215681 | jhb | 2010-11-22 08:15:11 -0700 (Mon, 22 Nov 2010) | 2 lines
  
      Remove some bogus, self-referential mergeinfo.
  
  r215682 | jhb | 2010-11-22 08:26:47 -0700 (Mon, 22 Nov 2010) | 5 lines
  
      Purge mergeinfo on sys/dev/xen/xenpci.  The only unique mergeinfo compared
      to head was not useful (it came in with the merge from /user/dfr/xenhvm/7
      and that mergeinfo is still present at sys/) and not worth keeping an extra
      set of mergeinfo around in the kernel.

Added:
  stable/8/sys/dev/xen/control/
     - copied from r214077, head/sys/dev/xen/control/
  stable/8/sys/xen/blkif.h
     - copied unchanged from r214077, head/sys/xen/blkif.h
  stable/8/sys/xen/xenbus/xenbus.c
     - copied unchanged from r214077, head/sys/xen/xenbus/xenbus.c
  stable/8/sys/xen/xenbus/xenbusb.c
     - copied unchanged from r214077, head/sys/xen/xenbus/xenbusb.c
  stable/8/sys/xen/xenbus/xenbusb.h
     - copied unchanged from r214077, head/sys/xen/xenbus/xenbusb.h
  stable/8/sys/xen/xenbus/xenbusb_back.c
     - copied unchanged from r214077, head/sys/xen/xenbus/xenbusb_back.c
  stable/8/sys/xen/xenbus/xenbusb_front.c
     - copied unchanged from r214077, head/sys/xen/xenbus/xenbusb_front.c
  stable/8/sys/xen/xenbus/xenbusb_if.m
     - copied unchanged from r214077, head/sys/xen/xenbus/xenbusb_if.m
  stable/8/sys/xen/xenstore/
     - copied from r214077, head/sys/xen/xenstore/
Deleted:
  stable/8/sys/xen/reboot.c
  stable/8/sys/xen/xenbus/init.txt
  stable/8/sys/xen/xenbus/xenbus_client.c
  stable/8/sys/xen/xenbus/xenbus_comms.c
  stable/8/sys/xen/xenbus/xenbus_comms.h
  stable/8/sys/xen/xenbus/xenbus_dev.c
  stable/8/sys/xen/xenbus/xenbus_probe.c
  stable/8/sys/xen/xenbus/xenbus_probe_backend.c
  stable/8/sys/xen/xenbus/xenbus_xs.c
Modified:
  stable/8/sys/conf/files
  stable/8/sys/dev/xen/balloon/balloon.c
  stable/8/sys/dev/xen/blkback/blkback.c
  stable/8/sys/dev/xen/blkfront/blkfront.c
  stable/8/sys/dev/xen/blkfront/block.h
  stable/8/sys/dev/xen/netfront/netfront.c
  stable/8/sys/dev/xen/xenpci/evtchn.c
  stable/8/sys/dev/xen/xenpci/xenpci.c
  stable/8/sys/i386/xen/xen_machdep.c
  stable/8/sys/xen/evtchn/evtchn.c
  stable/8/sys/xen/evtchn/evtchn_dev.c
  stable/8/sys/xen/gnttab.c
  stable/8/sys/xen/gnttab.h
  stable/8/sys/xen/interface/grant_table.h
  stable/8/sys/xen/interface/hvm/params.h
  stable/8/sys/xen/interface/io/blkif.h
  stable/8/sys/xen/interface/io/protocols.h
  stable/8/sys/xen/interface/io/ring.h
  stable/8/sys/xen/interface/io/xenbus.h
  stable/8/sys/xen/xen_intr.h
  stable/8/sys/xen/xenbus/xenbus_if.m
  stable/8/sys/xen/xenbus/xenbusvar.h
Directory Properties:
  stable/8/sys/   (props changed)
  stable/8/sys/dev/xen/xenpci/   (props changed)

Modified: stable/8/sys/conf/files
==============================================================================
--- stable/8/sys/conf/files	Wed Nov 24 00:43:05 2010	(r215787)
+++ stable/8/sys/conf/files	Wed Nov 24 01:03:03 2010	(r215788)
@@ -2955,19 +2955,20 @@ xen/gnttab.c			optional xen | xenhvm
 xen/features.c			optional xen | xenhvm
 xen/evtchn/evtchn.c		optional xen
 xen/evtchn/evtchn_dev.c		optional xen | xenhvm
-xen/reboot.c			optional xen
-xen/xenbus/xenbus_client.c	optional xen | xenhvm
-xen/xenbus/xenbus_comms.c	optional xen | xenhvm
-xen/xenbus/xenbus_dev.c		optional xen | xenhvm
 xen/xenbus/xenbus_if.m		optional xen | xenhvm
-xen/xenbus/xenbus_probe.c	optional xen | xenhvm
-#xen/xenbus/xenbus_probe_backend.c	optional xen
-xen/xenbus/xenbus_xs.c		optional xen | xenhvm
+xen/xenbus/xenbus.c		optional xen | xenhvm
+xen/xenbus/xenbusb_if.m		optional xen | xenhvm
+xen/xenbus/xenbusb.c		optional xen | xenhvm
+xen/xenbus/xenbusb_front.c	optional xen | xenhvm
+xen/xenbus/xenbusb_back.c	optional xen | xenhvm
+xen/xenstore/xenstore.c		optional xen | xenhvm
+xen/xenstore/xenstore_dev.c	optional xen | xenhvm
 dev/xen/balloon/balloon.c	optional xen | xenhvm
+dev/xen/blkfront/blkfront.c	optional xen | xenhvm
+dev/xen/blkback/blkback.c	optional xen | xenhvm
 dev/xen/console/console.c	optional xen
 dev/xen/console/xencons_ring.c	optional xen
-dev/xen/blkfront/blkfront.c	optional xen | xenhvm
+dev/xen/control/control.c	optional xen | xenhvm
 dev/xen/netfront/netfront.c	optional xen | xenhvm
 dev/xen/xenpci/xenpci.c		optional xenpci
 dev/xen/xenpci/evtchn.c         optional xenpci
-dev/xen/xenpci/machine_reboot.c optional xenpci

Modified: stable/8/sys/dev/xen/balloon/balloon.c
==============================================================================
--- stable/8/sys/dev/xen/balloon/balloon.c	Wed Nov 24 00:43:05 2010	(r215787)
+++ stable/8/sys/dev/xen/balloon/balloon.c	Wed Nov 24 01:03:03 2010	(r215788)
@@ -44,7 +44,7 @@ __FBSDID("$FreeBSD$");
 #include <machine/xen/xenfunc.h>
 #include <machine/xen/xenvar.h>
 #include <xen/hypervisor.h>
-#include <xen/xenbus/xenbusvar.h>
+#include <xen/xenstore/xenstorevar.h>
 
 #include <vm/vm.h>
 #include <vm/vm_page.h>
@@ -406,20 +406,20 @@ set_new_target(unsigned long target)
 	wakeup(balloon_process);
 }
 
-static struct xenbus_watch target_watch =
+static struct xs_watch target_watch =
 {
 	.node = "memory/target"
 };
 
 /* React to a change in the target key */
 static void 
-watch_target(struct xenbus_watch *watch,
+watch_target(struct xs_watch *watch,
 	     const char **vec, unsigned int len)
 {
 	unsigned long long new_target;
 	int err;
 
-	err = xenbus_scanf(XBT_NIL, "memory", "target", NULL,
+	err = xs_scanf(XST_NIL, "memory", "target", NULL,
 	    "%llu", &new_target);
 	if (err) {
 		/* This is ok (for domain0 at least) - so just return */
@@ -438,7 +438,7 @@ balloon_init_watcher(void *arg)
 {
 	int err;
 
-	err = register_xenbus_watch(&target_watch);
+	err = xs_register_watch(&target_watch);
 	if (err)
 		printf("Failed to set balloon watcher\n");
 

Modified: stable/8/sys/dev/xen/blkback/blkback.c
==============================================================================
--- stable/8/sys/dev/xen/blkback/blkback.c	Wed Nov 24 00:43:05 2010	(r215787)
+++ stable/8/sys/dev/xen/blkback/blkback.c	Wed Nov 24 01:03:03 2010	(r215788)
@@ -1,1055 +1,1919 @@
-/*
- * Copyright (c) 2006, Cisco Systems, Inc.
+/*-
+ * Copyright (c) 2009-2010 Spectra Logic Corporation
  * All rights reserved.
  *
- * Redistribution and use in source and binary forms, with or without 
- * modification, are permitted provided that the following conditions 
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
  * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
  *
- * 1. Redistributions of source code must retain the above copyright 
- *    notice, this list of conditions and the following disclaimer. 
- * 2. Redistributions in binary form must reproduce the above copyright 
- *    notice, this list of conditions and the following disclaimer in the 
- *    documentation and/or other materials provided with the distribution. 
- * 3. Neither the name of Cisco Systems, Inc. nor the names of its contributors 
- *    may be used to endorse or promote products derived from this software 
- *    without specific prior written permission. 
- *
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 
- * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 
- * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE 
- * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 
- * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 
- * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 
- * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 
- * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 
- * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 
- * POSSIBILITY OF SUCH DAMAGE.
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ * Authors: Justin T. Gibbs     (Spectra Logic Corporation)
+ *          Ken Merry           (Spectra Logic Corporation)
  */
-
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
+/**
+ * \file blkback.c
+ *
+ * \brief Device driver supporting the vending of block storage from
+ *        a FreeBSD domain to other domains.
+ */
+
 #include <sys/param.h>
 #include <sys/systm.h>
-#include <sys/mbuf.h>
-#include <sys/malloc.h>
 #include <sys/kernel.h>
-#include <sys/socket.h>
-#include <sys/queue.h>
-#include <sys/taskqueue.h>
+#include <sys/malloc.h>
+
+#include <sys/bio.h>
+#include <sys/bus.h>
+#include <sys/conf.h>
+#include <sys/devicestat.h>
+#include <sys/disk.h>
+#include <sys/fcntl.h>
+#include <sys/filedesc.h>
+#include <sys/kdb.h>
+#include <sys/module.h>
 #include <sys/namei.h>
 #include <sys/proc.h>
-#include <sys/filedesc.h>
+#include <sys/rman.h>
+#include <sys/taskqueue.h>
+#include <sys/types.h>
 #include <sys/vnode.h>
-#include <sys/fcntl.h>
-#include <sys/disk.h>
-#include <sys/bio.h>
-
-#include <sys/module.h>
-#include <sys/bus.h>
-#include <sys/sysctl.h>
+#include <sys/mount.h>
 
 #include <geom/geom.h>
 
+#include <machine/_inttypes.h>
+#include <machine/xen/xen-os.h>
+
+#include <vm/vm.h>
 #include <vm/vm_extern.h>
 #include <vm/vm_kern.h>
 
-#include <machine/xen-os.h>
-#include <machine/hypervisor.h>
-#include <machine/hypervisor-ifs.h>
-#include <machine/xen_intr.h>
-#include <machine/evtchn.h>
-#include <machine/xenbus.h>
-#include <machine/gnttab.h>
-#include <machine/xen-public/memory.h>
-#include <dev/xen/xenbus/xenbus_comms.h>
+#include <xen/blkif.h>
+#include <xen/evtchn.h>
+#include <xen/gnttab.h>
+#include <xen/xen_intr.h>
+
+#include <xen/interface/event_channel.h>
+#include <xen/interface/grant_table.h>
+
+#include <xen/xenbus/xenbusvar.h>
+
+/*--------------------------- Compile-time Tunables --------------------------*/
+/**
+ * The maximum number of outstanding request blocks (request headers plus
+ * additional segment blocks) we will allow in a negotiated block-front/back
+ * communication channel.
+ */
+#define	XBB_MAX_REQUESTS	256
+
+/**
+ * \brief Define to force all I/O to be performed on memory owned by the
+ *        backend device, with a copy-in/out to the remote domain's memory.
+ *
+ * \note  This option is currently required when this driver's domain is
+ *        operating in HVM mode on a system using an IOMMU.
+ *
+ * This driver uses Xen's grant table API to gain access to the memory of
+ * the remote domains it serves.  When our domain is operating in PV mode,
+ * the grant table mechanism directly updates our domain's page table entries
+ * to point to the physical pages of the remote domain.  This scheme guarantees
+ * that blkback and the backing devices it uses can safely perform DMA
+ * operations to satisfy requests.  In HVM mode, Xen may use a HW IOMMU to
+ * insure that our domain cannot DMA to pages owned by another domain.  As
+ * of Xen 4.0, IOMMU mappings for HVM guests are not updated via the grant
+ * table API.  For this reason, in HVM mode, we must bounce all requests into
+ * memory that is mapped into our domain at domain startup and thus has
+ * valid IOMMU mappings.
+ */
+#define XBB_USE_BOUNCE_BUFFERS
+
+/**
+ * \brief Define to enable rudimentary request logging to the console.
+ */
+#undef XBB_DEBUG
 
+/*---------------------------------- Macros ----------------------------------*/
+/**
+ * Custom malloc type for all driver allocations.
+ */
+MALLOC_DEFINE(M_XENBLOCKBACK, "xbbd", "Xen Block Back Driver Data");
 
-#if XEN_BLKBACK_DEBUG
+#ifdef XBB_DEBUG
 #define DPRINTF(fmt, args...) \
-    printf("blkback (%s:%d): " fmt, __FUNCTION__, __LINE__, ##args)
+    printf("xbb(%s:%d): " fmt, __FUNCTION__, __LINE__, ##args)
 #else
-#define DPRINTF(fmt, args...) ((void)0)
+#define DPRINTF(fmt, args...) do {} while(0)
 #endif
 
-#define WPRINTF(fmt, args...) \
-    printf("blkback (%s:%d): " fmt, __FUNCTION__, __LINE__, ##args)
+/**
+ * The maximum mapped region size per request we will allow in a negotiated
+ * block-front/back communication channel.
+ */
+#define	XBB_MAX_REQUEST_SIZE		\
+	MIN(MAXPHYS, BLKIF_MAX_SEGMENTS_PER_REQUEST * PAGE_SIZE)
 
-#define BLKBACK_INVALID_HANDLE (~0)
+/**
+ * The maximum number of segments (within a request header and accompanying
+ * segment blocks) per request we will allow in a negotiated block-front/back
+ * communication channel.
+ */
+#define	XBB_MAX_SEGMENTS_PER_REQUEST			\
+	(MIN(UIO_MAXIOV,				\
+	     MIN(BLKIF_MAX_SEGMENTS_PER_REQUEST,	\
+		 (XBB_MAX_REQUEST_SIZE / PAGE_SIZE) + 1)))
+
+/**
+ * The maximum number of shared memory ring pages we will allow in a
+ * negotiated block-front/back communication channel.  Allow enough
+ * ring space for all requests to be XBB_MAX_REQUEST_SIZE'd.
+ */
+#define	XBB_MAX_RING_PAGES						    \
+	BLKIF_RING_PAGES(BLKIF_SEGS_TO_BLOCKS(XBB_MAX_SEGMENTS_PER_REQUEST) \
+		       * XBB_MAX_REQUESTS)
+
+/*--------------------------- Forward Declarations ---------------------------*/
+struct xbb_softc;
+
+static void xbb_attach_failed(struct xbb_softc *xbb, int err, const char *fmt,
+			      ...) __attribute__((format(printf, 3, 4)));
+static int  xbb_shutdown(struct xbb_softc *xbb);
+static int  xbb_detach(device_t dev);
+
+/*------------------------------ Data Structures -----------------------------*/
+/**
+ * \brief Object tracking an in-flight I/O from a Xen VBD consumer.
+ */
+struct xbb_xen_req {
+	/**
+	 * Linked list links used to aggregate idle request in the
+	 * request free pool (xbb->request_free_slist).
+	 */
+	SLIST_ENTRY(xbb_xen_req) links;
+
+	/**
+	 * Back reference to the parent block back instance for this
+	 * request.  Used during bio_done handling.
+	 */
+	struct xbb_softc        *xbb;
+
+	/**
+	 * The remote domain's identifier for this I/O request.
+	 */
+	uint64_t		 id;
+
+	/**
+	 * Kernel virtual address space reserved for this request
+	 * structure and used to map the remote domain's pages for
+	 * this I/O, into our domain's address space.
+	 */
+	uint8_t			*kva;
+
+#ifdef XBB_USE_BOUNCE_BUFFERS
+	/**
+	 * Pre-allocated domain local memory used to proxy remote
+	 * domain memory during I/O operations.
+	 */
+	uint8_t			*bounce;
+#endif
 
-struct ring_ref {
-	vm_offset_t va;
-	grant_handle_t handle;
-	uint64_t bus_addr;
+	/**
+	 * Base, psuedo-physical address, corresponding to the start
+	 * of this request's kva region.
+	 */
+	uint64_t	 	 gnt_base;
+
+	/**
+	 * The number of pages currently mapped for this request.
+	 */
+	int			 nr_pages;
+
+	/**
+	 * The number of 512 byte sectors comprising this requests.
+	 */
+	int			 nr_512b_sectors;
+
+	/**
+	 * The number of struct bio requests still outstanding for this
+	 * request on the backend device.  This field is only used for	
+	 * device (rather than file) backed I/O.
+	 */
+	int			 pendcnt;
+
+	/**
+	 * BLKIF_OP code for this request.
+	 */
+	int			 operation;
+
+	/**
+	 * BLKIF_RSP status code for this request.
+	 *
+	 * This field allows an error status to be recorded even if the
+	 * delivery of this status must be deferred.  Deferred reporting
+	 * is necessary, for example, when an error is detected during
+	 * completion processing of one bio when other bios for this
+	 * request are still outstanding.
+	 */
+	int			 status;
+
+	/**
+	 * Device statistics request ordering type (ordered or simple).
+	 */
+	devstat_tag_type	 ds_tag_type;
+
+	/**
+	 * Device statistics request type (read, write, no_data).
+	 */
+	devstat_trans_flags	 ds_trans_type;
+
+	/**
+	 * The start time for this request.
+	 */
+	struct bintime		 ds_t0;
+
+	/**
+	 * Array of grant handles (one per page) used to map this request.
+	 */
+	grant_handle_t		*gnt_handles;
 };
+SLIST_HEAD(xbb_xen_req_slist, xbb_xen_req);
 
-typedef struct blkback_info {
-
-	/* Schedule lists */
-	STAILQ_ENTRY(blkback_info) next_req;
-	int on_req_sched_list;
-
-	struct xenbus_device *xdev;
-	XenbusState frontend_state;
-
-	domid_t domid;
-
-	int state;
-	int ring_connected;
-	struct ring_ref rr;
-	blkif_back_ring_t ring;
-	evtchn_port_t evtchn;
-	int irq;
-	void *irq_cookie;
-
-	int ref_cnt;
-
-	int handle;
-	char *mode;
-	char *type;
-	char *dev_name;
-
-	struct vnode *vn;
-	struct cdev *cdev;
-	struct cdevsw *csw;
-	u_int sector_size;
-	int sector_size_shift;
-	off_t media_size;
-	u_int media_num_sectors;
-	int major;
-	int minor;
-	int read_only;
-
-	struct mtx blk_ring_lock;
-
-	device_t ndev;
-
-	/* Stats */
-	int st_rd_req;
-	int st_wr_req;
-	int st_oo_req;
-	int st_err_req;
-} blkif_t;
-
-/*
- * These are rather arbitrary. They are fairly large because adjacent requests
- * pulled from a communication ring are quite likely to end up being part of
- * the same scatter/gather request at the disc.
- * 
- * ** TRY INCREASING 'blkif_reqs' IF WRITE SPEEDS SEEM TOO LOW **
- * 
- * This will increase the chances of being able to write whole tracks.
- * 64 should be enough to keep us competitive with Linux.
+/**
+ * \brief Configuration data for the shared memory request ring
+ *        used to communicate with the front-end client of this
+ *        this driver.
  */
-static int blkif_reqs = 64;
-TUNABLE_INT("xen.vbd.blkif_reqs", &blkif_reqs);
+struct xbb_ring_config {
+	/** KVA address where ring memory is mapped. */
+	vm_offset_t	va;
+
+	/** The pseudo-physical address where ring memory is mapped.*/
+	uint64_t	gnt_addr;
+
+	/**
+	 * Grant table handles, one per-ring page, returned by the
+	 * hyperpervisor upon mapping of the ring and required to
+	 * unmap it when a connection is torn down.
+	 */
+	grant_handle_t	handle[XBB_MAX_RING_PAGES];
+
+	/**
+	 * The device bus address returned by the hypervisor when
+	 * mapping the ring and required to unmap it when a connection
+	 * is torn down.
+	 */
+	uint64_t	bus_addr[XBB_MAX_RING_PAGES];
+
+	/** The number of ring pages mapped for the current connection. */
+	u_int		ring_pages;
+
+	/**
+	 * The grant references, one per-ring page, supplied by the
+	 * front-end, allowing us to reference the ring pages in the
+	 * front-end's domain and to map these pages into our own domain.
+	 */
+	grant_ref_t	ring_ref[XBB_MAX_RING_PAGES];
 
-static int mmap_pages;
+	/** The interrupt driven even channel used to signal ring events. */
+	evtchn_port_t   evtchn;
+};
 
-/*
- * Each outstanding request that we've passed to the lower device layers has a 
- * 'pending_req' allocated to it. Each buffer_head that completes decrements 
- * the pendcnt towards zero. When it hits zero, the specified domain has a 
- * response queued for it, with the saved 'id' passed back.
+/**
+ * Per-instance connection state flags.
  */
-typedef struct pending_req {
-	blkif_t       *blkif;
-	uint64_t       id;
-	int            nr_pages;
-	int            pendcnt;
-	unsigned short operation;
-	int            status;
-	STAILQ_ENTRY(pending_req) free_list;
-} pending_req_t;
-
-static pending_req_t *pending_reqs;
-static STAILQ_HEAD(pending_reqs_list, pending_req) pending_free =
-	STAILQ_HEAD_INITIALIZER(pending_free);
-static struct mtx pending_free_lock;
-
-static STAILQ_HEAD(blkback_req_sched_list, blkback_info) req_sched_list =
-	STAILQ_HEAD_INITIALIZER(req_sched_list);
-static struct mtx req_sched_list_lock;
-
-static unsigned long mmap_vstart;
-static unsigned long *pending_vaddrs;
-static grant_handle_t *pending_grant_handles;
-
-static struct task blk_req_task;
-
-/* Protos */
-static void disconnect_ring(blkif_t *blkif);
-static int vbd_add_dev(struct xenbus_device *xdev);
-
-static inline int vaddr_pagenr(pending_req_t *req, int seg)
+typedef enum
 {
-	return (req - pending_reqs) * BLKIF_MAX_SEGMENTS_PER_REQUEST + seg;
-}
+	/**
+	 * The front-end requested a read-only mount of the
+	 * back-end device/file.
+	 */
+	XBBF_READ_ONLY         = 0x01,
+
+	/** Communication with the front-end has been established. */
+	XBBF_RING_CONNECTED    = 0x02,
+
+	/**
+	 * Front-end requests exist in the ring and are waiting for
+	 * xbb_xen_req objects to free up.
+	 */
+	XBBF_RESOURCE_SHORTAGE = 0x04,
+
+	/** Connection teardown in progress. */
+	XBBF_SHUTDOWN          = 0x08
+} xbb_flag_t;
+
+/** Backend device type.  */
+typedef enum {
+	/** Backend type unknown. */
+	XBB_TYPE_NONE		= 0x00,
+
+	/**
+	 * Backend type disk (access via cdev switch
+	 * strategy routine).
+	 */
+	XBB_TYPE_DISK		= 0x01,
+
+	/** Backend type file (access vnode operations.). */
+	XBB_TYPE_FILE		= 0x02
+} xbb_type;
+
+/**
+ * \brief Structure used to memoize information about a per-request
+ *        scatter-gather list.
+ *
+ * The chief benefit of using this data structure is it avoids having
+ * to reparse the possibly discontiguous S/G list in the original
+ * request.  Due to the way that the mapping of the memory backing an
+ * I/O transaction is handled by Xen, a second pass is unavoidable.
+ * At least this way the second walk is a simple array traversal.
+ *
+ * \note A single Scatter/Gather element in the block interface covers
+ *       at most 1 machine page.  In this context a sector (blkif
+ *       nomenclature, not what I'd choose) is a 512b aligned unit
+ *       of mapping within the machine page referenced by an S/G
+ *       element.
+ */
+struct xbb_sg {
+	/** The number of 512b data chunks mapped in this S/G element. */
+	int16_t nsect;
+
+	/**
+	 * The index (0 based) of the first 512b data chunk mapped
+	 * in this S/G element.
+	 */
+	uint8_t first_sect;
+
+	/**
+	 * The index (0 based) of the last 512b data chunk mapped
+	 * in this S/G element.
+	 */
+	uint8_t last_sect;
+};
 
-static inline unsigned long vaddr(pending_req_t *req, int seg)
-{
-	return pending_vaddrs[vaddr_pagenr(req, seg)];
-}
+/**
+ * Character device backend specific configuration data.
+ */
+struct xbb_dev_data {
+	/** Cdev used for device backend access.  */
+	struct cdev   *cdev;
 
-#define pending_handle(_req, _seg) \
-	(pending_grant_handles[vaddr_pagenr(_req, _seg)])
+	/** Cdev switch used for device backend access.  */
+	struct cdevsw *csw;
 
-static unsigned long
-alloc_empty_page_range(unsigned long nr_pages)
-{
-	void *pages;
-	int i = 0, j = 0;
-	multicall_entry_t mcl[17];
-	unsigned long mfn_list[16];
-	struct xen_memory_reservation reservation = {
-		.extent_start = mfn_list,
-		.nr_extents   = 0,
-		.address_bits = 0,
-		.extent_order = 0,
-		.domid        = DOMID_SELF
-	};
+	/** Used to hold a reference on opened cdev backend devices. */
+	int	       dev_ref;
+};
 
-	pages = malloc(nr_pages*PAGE_SIZE, M_DEVBUF, M_NOWAIT);
-	if (pages == NULL)
-		return 0;
+/**
+ * File backend specific configuration data.
+ */
+struct xbb_file_data {
+	/** Credentials to use for vnode backed (file based) I/O. */
+	struct ucred   *cred;
+
+	/**
+	 * \brief Array of io vectors used to process file based I/O.
+	 *
+	 * Only a single file based request is outstanding per-xbb instance,
+	 * so we only need one of these.
+	 */
+	struct iovec	xiovecs[XBB_MAX_SEGMENTS_PER_REQUEST];
+#ifdef XBB_USE_BOUNCE_BUFFERS
+
+	/**
+	 * \brief Array of io vectors used to handle bouncing of file reads.
+	 *
+	 * Vnode operations are free to modify uio data during their
+	 * exectuion.  In the case of a read with bounce buffering active,
+	 * we need some of the data from the original uio in order to
+	 * bounce-out the read data.  This array serves as the temporary
+	 * storage for this saved data.
+	 */
+	struct iovec	saved_xiovecs[XBB_MAX_SEGMENTS_PER_REQUEST];
+
+	/**
+	 * \brief Array of memoized bounce buffer kva offsets used
+	 *        in the file based backend.
+	 *
+	 * Due to the way that the mapping of the memory backing an
+	 * I/O transaction is handled by Xen, a second pass through
+	 * the request sg elements is unavoidable. We memoize the computed
+	 * bounce address here to reduce the cost of the second walk.
+	 */
+	void		*xiovecs_vaddr[XBB_MAX_SEGMENTS_PER_REQUEST];
+#endif /* XBB_USE_BOUNCE_BUFFERS */
+};
 
-	memset(mcl, 0, sizeof(mcl));
+/**
+ * Collection of backend type specific data.
+ */
+union xbb_backend_data {
+	struct xbb_dev_data  dev;
+	struct xbb_file_data file;
+};
 
-	while (i < nr_pages) {
-		unsigned long va = (unsigned long)pages + (i++ * PAGE_SIZE);
+/**
+ * Function signature of backend specific I/O handlers.
+ */
+typedef int (*xbb_dispatch_t)(struct xbb_softc *xbb, blkif_request_t *ring_req,
+			      struct xbb_xen_req *req, int nseg,
+			      int operation, int flags);
 
-		mcl[j].op = __HYPERVISOR_update_va_mapping;
-		mcl[j].args[0] = va;
+/**
+ * Per-instance configuration data.
+ */
+struct xbb_softc {
 
-		mfn_list[j++] = vtomach(va) >> PAGE_SHIFT;
+	/**
+	 * Task-queue used to process I/O requests.
+	 */
+	struct taskqueue	 *io_taskqueue;
+
+	/**
+	 * Single "run the request queue" task enqueued
+	 * on io_taskqueue.
+	 */
+	struct task		  io_task;
+
+	/** Device type for this instance. */
+	xbb_type		  device_type;
+
+	/** NewBus device corresponding to this instance. */
+	device_t		  dev;
+
+	/** Backend specific dispatch routine for this instance. */
+	xbb_dispatch_t		  dispatch_io;
+
+	/** The number of requests outstanding on the backend device/file. */
+	u_int			  active_request_count;
+
+	/** Free pool of request tracking structures. */
+	struct xbb_xen_req_slist  request_free_slist;
+
+	/** Array, sized at connection time, of request tracking structures. */
+	struct xbb_xen_req	 *requests;
+
+	/**
+	 * Global pool of kva used for mapping remote domain ring
+	 * and I/O transaction data.
+	 */
+	vm_offset_t		  kva;
+
+	/** Psuedo-physical address corresponding to kva. */
+	uint64_t		  gnt_base_addr;
+
+	/** The size of the global kva pool. */
+	int			  kva_size;
+
+	/**
+	 * \brief Cached value of the front-end's domain id.
+	 * 
+	 * This value is used at once for each mapped page in
+	 * a transaction.  We cache it to avoid incuring the
+	 * cost of an ivar access every time this is needed.
+	 */
+	domid_t			  otherend_id;
+
+	/**
+	 * \brief The blkif protocol abi in effect.
+	 *
+	 * There are situations where the back and front ends can
+	 * have a different, native abi (e.g. intel x86_64 and
+	 * 32bit x86 domains on the same machine).  The back-end
+	 * always accomodates the front-end's native abi.  That
+	 * value is pulled from the XenStore and recorded here.
+	 */
+	int			  abi;
+
+	/**
+	 * \brief The maximum number of requests allowed to be in
+	 *        flight at a time.
+	 *
+	 * This value is negotiated via the XenStore.
+	 */
+	uint32_t		  max_requests;
+
+	/**
+	 * \brief The maximum number of segments (1 page per segment)
+	 *	  that can be mapped by a request.
+	 *
+	 * This value is negotiated via the XenStore.
+	 */
+	uint32_t		  max_request_segments;
+
+	/**
+	 * The maximum size of any request to this back-end
+	 * device.
+	 *
+	 * This value is negotiated via the XenStore.
+	 */
+	uint32_t		  max_request_size;
+
+	/** Various configuration and state bit flags. */
+	xbb_flag_t		  flags;
+
+	/** Ring mapping and interrupt configuration data. */
+	struct xbb_ring_config	  ring_config;
+
+	/** Runtime, cross-abi safe, structures for ring access. */
+	blkif_back_rings_t	  rings;
+
+	/** IRQ mapping for the communication ring event channel. */
+	int			  irq;
+
+	/**
+	 * \brief Backend access mode flags (e.g. write, or read-only).
+	 *
+	 * This value is passed to us by the front-end via the XenStore.
+	 */
+	char			 *dev_mode;
+
+	/**
+	 * \brief Backend device type (e.g. "disk", "cdrom", "floppy").
+	 *
+	 * This value is passed to us by the front-end via the XenStore.
+	 * Currently unused.
+	 */
+	char			 *dev_type;
+
+	/**
+	 * \brief Backend device/file identifier.
+	 *
+	 * This value is passed to us by the front-end via the XenStore.
+	 * We expect this to be a POSIX path indicating the file or
+	 * device to open.
+	 */
+	char			 *dev_name;
+
+	/**
+	 * Vnode corresponding to the backend device node or file
+	 * we are acessing.
+	 */
+	struct vnode		 *vn;
+
+	union xbb_backend_data	  backend;
+	/** The native sector size of the backend. */
+	u_int			  sector_size;
+
+	/** log2 of sector_size.  */
+	u_int			  sector_size_shift;
+
+	/** Size in bytes of the backend device or file.  */
+	off_t			  media_size;
+
+	/**
+	 * \brief media_size expressed in terms of the backend native
+	 *	  sector size.
+	 *
+	 * (e.g. xbb->media_size >> xbb->sector_size_shift).
+	 */
+	uint64_t		  media_num_sectors;
+
+	/**
+	 * \brief Array of memoized scatter gather data computed during the
+	 *	  conversion of blkif ring requests to internal xbb_xen_req
+	 *	  structures.
+	 *
+	 * Ring processing is serialized so we only need one of these.
+	 */
+	struct xbb_sg		  xbb_sgs[XBB_MAX_SEGMENTS_PER_REQUEST];
+
+	/** Mutex protecting per-instance data. */
+	struct mtx		  lock;
+
+#ifdef XENHVM
+	/**
+	 * Resource representing allocated physical address space
+	 * associated with our per-instance kva region.
+	 */
+	struct resource		 *pseudo_phys_res;
 
-		xen_phys_machine[(vtophys(va) >> PAGE_SHIFT)] = INVALID_P2M_ENTRY;
+	/** Resource id for allocated physical address space. */
+	int			  pseudo_phys_res_id;
+#endif
 
-		if (j == 16 || i == nr_pages) {
-			mcl[j-1].args[MULTI_UVMFLAGS_INDEX] = UVMF_TLB_FLUSH|UVMF_LOCAL;
+	/** I/O statistics. */
+	struct devstat		 *xbb_stats;
+};
 
-			reservation.nr_extents = j;
+/*---------------------------- Request Processing ----------------------------*/
+/**
+ * Allocate an internal transaction tracking structure from the free pool.
+ *
+ * \param xbb  Per-instance xbb configuration structure.
+ *
+ * \return  On success, a pointer to the allocated xbb_xen_req structure.
+ *          Otherwise NULL.
+ */
+static inline struct xbb_xen_req *
+xbb_get_req(struct xbb_softc *xbb)
+{
+	struct xbb_xen_req *req;
 
-			mcl[j].op = __HYPERVISOR_memory_op;
-			mcl[j].args[0] = XENMEM_decrease_reservation;
-			mcl[j].args[1] =  (unsigned long)&reservation;
-			
-			(void)HYPERVISOR_multicall(mcl, j+1);
+	req = NULL;
+	mtx_lock(&xbb->lock);
 
-			mcl[j-1].args[MULTI_UVMFLAGS_INDEX] = 0;
-			j = 0;
+	/*
+	 * Do not allow new requests to be allocated while we
+	 * are shutting down.
+	 */
+	if ((xbb->flags & XBBF_SHUTDOWN) == 0) {
+		if ((req = SLIST_FIRST(&xbb->request_free_slist)) != NULL) {
+			SLIST_REMOVE_HEAD(&xbb->request_free_slist, links);
+			xbb->active_request_count++;
+		} else {
+			xbb->flags |= XBBF_RESOURCE_SHORTAGE;
 		}
 	}
-
-	return (unsigned long)pages;
-}
-
-static pending_req_t *
-alloc_req(void)
-{
-	pending_req_t *req;
-	mtx_lock(&pending_free_lock);
-	if ((req = STAILQ_FIRST(&pending_free))) {
-		STAILQ_REMOVE(&pending_free, req, pending_req, free_list);
-		STAILQ_NEXT(req, free_list) = NULL;
-	}
-	mtx_unlock(&pending_free_lock);
-	return req;
+	mtx_unlock(&xbb->lock);
+	return (req);
 }
 
-static void
-free_req(pending_req_t *req)
+/**
+ * Return an allocated transaction tracking structure to the free pool.
+ *
+ * \param xbb  Per-instance xbb configuration structure.
+ * \param req  The request structure to free.
+ */
+static inline void
+xbb_release_req(struct xbb_softc *xbb, struct xbb_xen_req *req)
 {
-	int was_empty;
-
-	mtx_lock(&pending_free_lock);
-	was_empty = STAILQ_EMPTY(&pending_free);
-	STAILQ_INSERT_TAIL(&pending_free, req, free_list);
-	mtx_unlock(&pending_free_lock);
-	if (was_empty)
-		taskqueue_enqueue(taskqueue_swi, &blk_req_task); 
-}
+	int wake_thread;
 
-static void
-fast_flush_area(pending_req_t *req)
-{
-	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
-	unsigned int i, invcount = 0;
-	grant_handle_t handle;
-	int ret;
+	mtx_lock(&xbb->lock);
+	wake_thread = xbb->flags & XBBF_RESOURCE_SHORTAGE;
+	xbb->flags &= ~XBBF_RESOURCE_SHORTAGE;
+	SLIST_INSERT_HEAD(&xbb->request_free_slist, req, links);
+	xbb->active_request_count--;
 
-	for (i = 0; i < req->nr_pages; i++) {
-		handle = pending_handle(req, i);
-		if (handle == BLKBACK_INVALID_HANDLE)
-			continue;
-		unmap[invcount].host_addr    = vaddr(req, i);
-		unmap[invcount].dev_bus_addr = 0;
-		unmap[invcount].handle       = handle;
-		pending_handle(req, i) = BLKBACK_INVALID_HANDLE;
-		invcount++;
+	if ((xbb->flags & XBBF_SHUTDOWN) != 0) {
+		/*
+		 * Shutdown is in progress.  See if we can
+		 * progress further now that one more request
+		 * has completed and been returned to the
+		 * free pool.
+		 */
+		xbb_shutdown(xbb);
 	}
+	mtx_unlock(&xbb->lock);
 
-	ret = HYPERVISOR_grant_table_op(
-		GNTTABOP_unmap_grant_ref, unmap, invcount);
-	PANIC_IF(ret);
+	if (wake_thread != 0)

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***


More information about the svn-src-all mailing list