svn commit: r232352 - in stable/8/sys: dev/xen/blkback
	dev/xen/blkfront xen/interface/io xen/xenbus
    Justin T. Gibbs 
    gibbs at FreeBSD.org
       
    Thu Mar  1 19:09:29 UTC 2012
    
    
  
Author: gibbs
Date: Thu Mar  1 19:09:28 2012
New Revision: 232352
URL: http://svn.freebsd.org/changeset/base/232352
Log:
  MFC r231743,231836-231837,231839,231883,232308
      Xen PV block interface enhancements
  
  Approved by:  re (kib)
  Reviewed by:  cperciva
  Tested by:    cperciva
  Sponsored by: Spectra Logic Corporation
  
  r231743
  =======
  Enhance documentation, improve interoperability, and fix defects in
  FreeBSD's front and back Xen blkif interface drivers.
  
  sys/dev/xen/blkfront/block.h:
  sys/dev/xen/blkfront/blkfront.c:
  sys/dev/xen/blkback/blkback.c:
  	Replace FreeBSD specific multi-page ring impelementation with
  	support for both the Citrix and Amazon/RedHat versions of this
  	extension.
  
  sys/dev/xen/blkfront/blkfront.c:
  	o Add a per-instance sysctl tree that exposes all negotiated
  	  transport parameters (ring pages, max number of requests,
  	  max request size, max number of segments).
  	o In blkfront_vdevice_to_unit() add a missing return statement
  	  so that we properly identify the unit number for high numbered
  	  xvd devices.
  
  sys/dev/xen/blkback/blkback.c:
  	o Add static dtrace probes for several events in this driver.
  	o Defer connection shutdown processing until the front-end
  	  enters the closed state.  This avoids prematurely tearing
  	  down the connection when buggy front-ends transition to the
  	  closing state, even though the device is open and they
  	  veto the close request from the tool stack.
  	o Add nodes for maximum request size and the number of active
  	  ring pages to the exising, per-instance, sysctl tree.
  	o Miscelaneous style cleanup.
  
  sys/xen/interface/io/blkif.h:
  	o Add extensive documentation of the XenStore nodes used to
  	  implement the blkif interface.
  	o Document the startup sequence between a front and back driver.
  	o Add structures and documenatation for the "discard" feature
  	  (AKA Trim).
  	o Cleanup some definitions related to FreeBSD's request
  	  number/size/segment-limit extension.
  
  sys/dev/xen/blkfront/blkfront.c:
  sys/dev/xen/blkback/blkback.c:
  sys/xen/xenbus/xenbusvar.h:
  	Add the convenience function xenbus_get_otherend_state() and
  	use it to simplify some logic in both block-front and block-back.
  
  r231836
  =======
  Fix "_" vs. "-" typo in a comment.  No functional changes.
  
  r231837
  =======
  Fix typo in a printf string: "specificed" -> "specified".
  
  r231839
  =======
  Fix a bug in the calculation of the maximum I/O request size.
  The previous code did not limit the I/O request size based on
  the maximum number of segments supported by the back-end.  In
  current practice, since the only back-end supporting chained
  requests is the FreeBSD implementation, this limit was never
  exceeded.
  
  sys/dev/xen/blkfront/block.h:
  	Add two macros, XBF_SEGS_TO_SIZE() and XBF_SIZE_TO_SEGS(),
  	to centralize the logic of reserving a segment to deal with
  	non-page-aligned I/Os.
  
  sys/dev/xen/blkfront/blkfront.c:
  	o When negotiating transfer parameters, limit the
  	  max_request_size we use and publish, if it is greater
  	  than the maximum, unaligned, I/O we can support with
  	  the number of segments advertised by the backend.
  	o Don't unilaterally reduce the I/O size published to
  	  the disk layer by a single page.  max_request_size
  	  is already properly limited in the transfer parameter
  	  negotiation code.
  	o Fix typos in printf strings:
  		"max_requests_segments" -> "max_request_segments"
  		"specificed" -> "specified"
  
  r231883
  =======
  Fix regression in the handling of blkback close events for
  devices that are unplugged via QEMU.
  
  sys/dev/xen/blkback/blkback.c:
  	Toolstack initiated closures change the frontend's state
  	to Closing.  The backend must change to Closing as well,
  	even if we can't actually close yet, in order for the
  	frontend to notice and start the closing process.
  
  r232308
  =======
  blkif interface comment cleanups.  No functional changes
  
  sys/xen/interface/io/blkif.h:
  	o Insert space in "Red Hat".
  	o Fix typo "discard-aligment" -> "discard-alignment"
  	o Fix typo "unamp" -> "unmap"
  	o Fix typo "formated" -> "formatted"
  	o Clarify the text for "params".
  	o Clarify the text for "sector-size".
  	o Clarify the text for "max-requests" in the backend section.
Modified:
  stable/8/sys/dev/xen/blkback/blkback.c
  stable/8/sys/dev/xen/blkfront/blkfront.c
  stable/8/sys/dev/xen/blkfront/block.h
  stable/8/sys/xen/interface/io/blkif.h
  stable/8/sys/xen/xenbus/xenbusvar.h
Directory Properties:
  stable/8/sys/   (props changed)
Modified: stable/8/sys/dev/xen/blkback/blkback.c
==============================================================================
--- stable/8/sys/dev/xen/blkback/blkback.c	Thu Mar  1 18:45:25 2012	(r232351)
+++ stable/8/sys/dev/xen/blkback/blkback.c	Thu Mar  1 19:09:28 2012	(r232352)
@@ -40,6 +40,8 @@ __FBSDID("$FreeBSD$");
  *        a FreeBSD domain to other domains.
  */
 
+#include "opt_kdtrace.h"
+
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
@@ -63,6 +65,7 @@ __FBSDID("$FreeBSD$");
 #include <sys/mount.h>
 #include <sys/sysctl.h>
 #include <sys/bitstring.h>
+#include <sys/sdt.h>
 
 #include <geom/geom.h>
 
@@ -124,7 +127,7 @@ __FBSDID("$FreeBSD$");
 MALLOC_DEFINE(M_XENBLOCKBACK, "xbbd", "Xen Block Back Driver Data");
 
 #ifdef XBB_DEBUG
-#define DPRINTF(fmt, args...) \
+#define DPRINTF(fmt, args...)					\
     printf("xbb(%s:%d): " fmt, __FUNCTION__, __LINE__, ##args)
 #else
 #define DPRINTF(fmt, args...) do {} while(0)
@@ -134,7 +137,7 @@ MALLOC_DEFINE(M_XENBLOCKBACK, "xbbd", "X
  * The maximum mapped region size per request we will allow in a negotiated
  * block-front/back communication channel.
  */
-#define	XBB_MAX_REQUEST_SIZE		\
+#define	XBB_MAX_REQUEST_SIZE					\
 	MIN(MAXPHYS, BLKIF_MAX_SEGMENTS_PER_REQUEST * PAGE_SIZE)
 
 /**
@@ -142,9 +145,9 @@ MALLOC_DEFINE(M_XENBLOCKBACK, "xbbd", "X
  * segment blocks) per request we will allow in a negotiated block-front/back
  * communication channel.
  */
-#define	XBB_MAX_SEGMENTS_PER_REQUEST			\
-	(MIN(UIO_MAXIOV,				\
-	     MIN(BLKIF_MAX_SEGMENTS_PER_REQUEST,	\
+#define	XBB_MAX_SEGMENTS_PER_REQUEST				\
+	(MIN(UIO_MAXIOV,					\
+	     MIN(BLKIF_MAX_SEGMENTS_PER_REQUEST,		\
 		 (XBB_MAX_REQUEST_SIZE / PAGE_SIZE) + 1)))
 
 /**
@@ -980,9 +983,10 @@ xbb_get_gntaddr(struct xbb_xen_reqlist *
 static uint8_t *
 xbb_get_kva(struct xbb_softc *xbb, int nr_pages)
 {
-	intptr_t first_clear, num_clear;
+	intptr_t first_clear;
+	intptr_t num_clear;
 	uint8_t *free_kva;
-	int i;
+	int      i;
 
 	KASSERT(nr_pages != 0, ("xbb_get_kva of zero length"));
 
@@ -1681,19 +1685,19 @@ xbb_dispatch_io(struct xbb_softc *xbb, s
 			req_ring_idx++;
 			switch (xbb->abi) {
 			case BLKIF_PROTOCOL_NATIVE:
-				sg = BLKRING_GET_SG_REQUEST(&xbb->rings.native,
-							    req_ring_idx);
+				sg = BLKRING_GET_SEG_BLOCK(&xbb->rings.native,
+							   req_ring_idx);
 				break;
 			case BLKIF_PROTOCOL_X86_32:
 			{
-				sg = BLKRING_GET_SG_REQUEST(&xbb->rings.x86_32,
-							    req_ring_idx);
+				sg = BLKRING_GET_SEG_BLOCK(&xbb->rings.x86_32,
+							   req_ring_idx);
 				break;
 			}
 			case BLKIF_PROTOCOL_X86_64:
 			{
-				sg = BLKRING_GET_SG_REQUEST(&xbb->rings.x86_64,
-							    req_ring_idx);
+				sg = BLKRING_GET_SEG_BLOCK(&xbb->rings.x86_64,
+							   req_ring_idx);
 				break;
 			}
 			default:
@@ -1817,8 +1821,8 @@ xbb_run_queue(void *context, int pending
 	struct xbb_xen_reqlist *reqlist;
 
 
-	xbb	      = (struct xbb_softc *)context;
-	rings	      = &xbb->rings;
+	xbb   = (struct xbb_softc *)context;
+	rings = &xbb->rings;
 
 	/*
 	 * Work gather and dispatch loop.  Note that we have a bias here
@@ -2032,6 +2036,13 @@ xbb_intr(void *arg)
 	taskqueue_enqueue(xbb->io_taskqueue, &xbb->io_task); 
 }
 
+SDT_PROVIDER_DEFINE(xbb);
+SDT_PROBE_DEFINE1(xbb, kernel, xbb_dispatch_dev, flush, flush, "int");
+SDT_PROBE_DEFINE3(xbb, kernel, xbb_dispatch_dev, read, read, "int", "uint64_t",
+		  "uint64_t");
+SDT_PROBE_DEFINE3(xbb, kernel, xbb_dispatch_dev, write, write, "int",
+		  "uint64_t", "uint64_t");
+
 /*----------------------------- Backend Handlers -----------------------------*/
 /**
  * Backend handler for character device access.
@@ -2087,6 +2098,9 @@ xbb_dispatch_dev(struct xbb_softc *xbb, 
 
 		nreq->pendcnt	 = 1;
 
+		SDT_PROBE1(xbb, kernel, xbb_dispatch_dev, flush,
+			   device_get_unit(xbb->dev));
+
 		(*dev_data->csw->d_strategy)(bio);
 
 		return (0);
@@ -2181,6 +2195,17 @@ xbb_dispatch_dev(struct xbb_softc *xbb, 
 			       bios[bio_idx]->bio_bcount);
 		}
 #endif
+		if (operation == BIO_READ) {
+			SDT_PROBE3(xbb, kernel, xbb_dispatch_dev, read,
+				   device_get_unit(xbb->dev),
+				   bios[bio_idx]->bio_offset,
+				   bios[bio_idx]->bio_length);
+		} else if (operation == BIO_WRITE) {
+			SDT_PROBE3(xbb, kernel, xbb_dispatch_dev, write,
+				   device_get_unit(xbb->dev),
+				   bios[bio_idx]->bio_offset,
+				   bios[bio_idx]->bio_length);
+		}
 		(*dev_data->csw->d_strategy)(bios[bio_idx]);
 	}
 
@@ -2193,6 +2218,12 @@ fail_free_bios:
 	return (error);
 }
 
+SDT_PROBE_DEFINE1(xbb, kernel, xbb_dispatch_file, flush, flush, "int");
+SDT_PROBE_DEFINE3(xbb, kernel, xbb_dispatch_file, read, read, "int", "uint64_t",
+		  "uint64_t");
+SDT_PROBE_DEFINE3(xbb, kernel, xbb_dispatch_file, write, write, "int",
+		  "uint64_t", "uint64_t");
+
 /**
  * Backend handler for file access.
  *
@@ -2237,6 +2268,9 @@ xbb_dispatch_file(struct xbb_softc *xbb,
 	case BIO_FLUSH: {
 		struct mount *mountpoint;
 
+		SDT_PROBE1(xbb, kernel, xbb_dispatch_file, flush,
+			   device_get_unit(xbb->dev));
+
 		vfs_is_locked = VFS_LOCK_GIANT(xbb->vn->v_mount);
 
 		(void) vn_start_write(xbb->vn, &mountpoint, V_WAIT);
@@ -2336,6 +2370,10 @@ xbb_dispatch_file(struct xbb_softc *xbb,
 	switch (operation) {
 	case BIO_READ:
 
+		SDT_PROBE3(xbb, kernel, xbb_dispatch_file, read,
+			   device_get_unit(xbb->dev), xuio.uio_offset,
+			   xuio.uio_resid);
+
 		vn_lock(xbb->vn, LK_EXCLUSIVE | LK_RETRY);
 
 		/*
@@ -2366,6 +2404,10 @@ xbb_dispatch_file(struct xbb_softc *xbb,
 	case BIO_WRITE: {
 		struct mount *mountpoint;
 
+		SDT_PROBE3(xbb, kernel, xbb_dispatch_file, write,
+			   device_get_unit(xbb->dev), xuio.uio_offset,
+			   xuio.uio_resid);
+
 		(void)vn_start_write(xbb->vn, &mountpoint, V_WAIT);
 
 		vn_lock(xbb->vn, LK_EXCLUSIVE | LK_RETRY);
@@ -3028,6 +3070,8 @@ xbb_collect_frontend_info(struct xbb_sof
 	const char *otherend_path;
 	int	    error;
 	u_int	    ring_idx;
+	u_int	    ring_page_order;
+	size_t	    ring_size;
 
 	otherend_path = xenbus_get_otherend_path(xbb->dev);
 
@@ -3035,23 +3079,19 @@ xbb_collect_frontend_info(struct xbb_sof
 	 * Protocol defaults valid even if all negotiation fails.
 	 */
 	xbb->ring_config.ring_pages = 1;
-	xbb->max_requests	    = BLKIF_MAX_RING_REQUESTS(PAGE_SIZE);
 	xbb->max_request_segments   = BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK;
 	xbb->max_request_size	    = xbb->max_request_segments * PAGE_SIZE;
 
 	/*
 	 * Mandatory data (used in all versions of the protocol) first.
 	 */
-	error = xs_gather(XST_NIL, otherend_path,
-			  "ring-ref", "%" PRIu32,
-			  &xbb->ring_config.ring_ref[0],
-			  "event-channel", "%" PRIu32,
-			  &xbb->ring_config.evtchn,
-			  NULL);
+	error = xs_scanf(XST_NIL, otherend_path,
+			 "event-channel", NULL, "%" PRIu32,
+			 &xbb->ring_config.evtchn);
 	if (error != 0) {
 		xenbus_dev_fatal(xbb->dev, error,
-				 "Unable to retrieve ring information from "
-				 "frontend %s.  Unable to connect.",
+				 "Unable to retrieve event-channel information "
+				 "from frontend %s.  Unable to connect.",
 				 xenbus_get_otherend_path(xbb->dev));
 		return (error);
 	}
@@ -3065,10 +3105,20 @@ xbb_collect_frontend_info(struct xbb_sof
 	 *       we must use independant calls in order to guarantee
 	 *       we don't miss information in a sparsly populated front-end
 	 *       tree.
+	 *
+	 * \note xs_scanf() does not update variables for unmatched
+	 *       fields.
 	 */
+	ring_page_order = 0;
+	(void)xs_scanf(XST_NIL, otherend_path,
+		       "ring-page-order", NULL, "%u",
+		       &ring_page_order);
+	xbb->ring_config.ring_pages = 1 << ring_page_order;
 	(void)xs_scanf(XST_NIL, otherend_path,
-		       "ring-pages", NULL, "%u",
+		       "num-ring-pages", NULL, "%u",
 		       &xbb->ring_config.ring_pages);
+	ring_size = PAGE_SIZE * xbb->ring_config.ring_pages;
+	xbb->max_requests = BLKIF_MAX_RING_REQUESTS(ring_size);
 
 	(void)xs_scanf(XST_NIL, otherend_path,
 		       "max-requests", NULL, "%u",
@@ -3084,7 +3134,7 @@ xbb_collect_frontend_info(struct xbb_sof
 
 	if (xbb->ring_config.ring_pages	> XBB_MAX_RING_PAGES) {
 		xenbus_dev_fatal(xbb->dev, EINVAL,
-				 "Front-end specificed ring-pages of %u "
+				 "Front-end specified ring-pages of %u "
 				 "exceeds backend limit of %zu.  "
 				 "Unable to connect.",
 				 xbb->ring_config.ring_pages,
@@ -3092,7 +3142,7 @@ xbb_collect_frontend_info(struct xbb_sof
 		return (EINVAL);
 	} else if (xbb->max_requests > XBB_MAX_REQUESTS) {
 		xenbus_dev_fatal(xbb->dev, EINVAL,
-				 "Front-end specificed max_requests of %u "
+				 "Front-end specified max_requests of %u "
 				 "exceeds backend limit of %u.  "
 				 "Unable to connect.",
 				 xbb->max_requests,
@@ -3100,7 +3150,7 @@ xbb_collect_frontend_info(struct xbb_sof
 		return (EINVAL);
 	} else if (xbb->max_request_segments > XBB_MAX_SEGMENTS_PER_REQUEST) {
 		xenbus_dev_fatal(xbb->dev, EINVAL,
-				 "Front-end specificed max_requests_segments "
+				 "Front-end specified max_requests_segments "
 				 "of %u exceeds backend limit of %u.  "
 				 "Unable to connect.",
 				 xbb->max_request_segments,
@@ -3108,7 +3158,7 @@ xbb_collect_frontend_info(struct xbb_sof
 		return (EINVAL);
 	} else if (xbb->max_request_size > XBB_MAX_REQUEST_SIZE) {
 		xenbus_dev_fatal(xbb->dev, EINVAL,
-				 "Front-end specificed max_request_size "
+				 "Front-end specified max_request_size "
 				 "of %u exceeds backend limit of %u.  "
 				 "Unable to connect.",
 				 xbb->max_request_size,
@@ -3116,22 +3166,39 @@ xbb_collect_frontend_info(struct xbb_sof
 		return (EINVAL);
 	}
 
-	/* If using a multi-page ring, pull in the remaining references. */
-	for (ring_idx = 1; ring_idx < xbb->ring_config.ring_pages; ring_idx++) {
-		char ring_ref_name[]= "ring_refXX";
-
-		snprintf(ring_ref_name, sizeof(ring_ref_name),
-			 "ring-ref%u", ring_idx);
-		error = xs_scanf(XST_NIL, otherend_path,
-				 ring_ref_name, NULL, "%" PRIu32,
-			         &xbb->ring_config.ring_ref[ring_idx]);
+	if (xbb->ring_config.ring_pages	== 1) {
+		error = xs_gather(XST_NIL, otherend_path,
+				  "ring-ref", "%" PRIu32,
+				  &xbb->ring_config.ring_ref[0],
+				  NULL);
 		if (error != 0) {
 			xenbus_dev_fatal(xbb->dev, error,
-					 "Failed to retriev grant reference "
-					 "for page %u of shared ring.  Unable "
-					 "to connect.", ring_idx);
+					 "Unable to retrieve ring information "
+					 "from frontend %s.  Unable to "
+					 "connect.",
+					 xenbus_get_otherend_path(xbb->dev));
 			return (error);
 		}
+	} else {
+		/* Multi-page ring format. */
+		for (ring_idx = 0; ring_idx < xbb->ring_config.ring_pages;
+		     ring_idx++) {
+			char ring_ref_name[]= "ring_refXX";
+
+			snprintf(ring_ref_name, sizeof(ring_ref_name),
+				 "ring-ref%u", ring_idx);
+			error = xs_scanf(XST_NIL, otherend_path,
+					 ring_ref_name, NULL, "%" PRIu32,
+					 &xbb->ring_config.ring_ref[ring_idx]);
+			if (error != 0) {
+				xenbus_dev_fatal(xbb->dev, error,
+						 "Failed to retriev grant "
+						 "reference for page %u of "
+						 "shared ring.  Unable "
+						 "to connect.", ring_idx);
+				return (error);
+			}
+		}
 	}
 
 	error = xs_gather(XST_NIL, otherend_path,
@@ -3197,8 +3264,8 @@ xbb_alloc_requests(struct xbb_softc *xbb
 static int
 xbb_alloc_request_lists(struct xbb_softc *xbb)
 {
-	int i;
 	struct xbb_xen_reqlist *reqlist;
+	int			i;
 
 	/*
 	 * If no requests can be merged, we need 1 request list per
@@ -3318,7 +3385,7 @@ xbb_publish_backend_info(struct xbb_soft
 static void
 xbb_connect(struct xbb_softc *xbb)
 {
-	int		      error;
+	int error;
 
 	if (xenbus_get_state(xbb->dev) == XenbusStateConnected)
 		return;
@@ -3399,7 +3466,8 @@ xbb_connect(struct xbb_softc *xbb)
 static int
 xbb_shutdown(struct xbb_softc *xbb)
 {
-	int error;
+	XenbusState frontState;
+	int	    error;
 
 	DPRINTF("\n");
 
@@ -3413,6 +3481,20 @@ xbb_shutdown(struct xbb_softc *xbb)
 	if ((xbb->flags & XBBF_IN_SHUTDOWN) != 0)
 		return (EAGAIN);
 
+	xbb->flags |= XBBF_IN_SHUTDOWN;
+	mtx_unlock(&xbb->lock);
+
+	if (xenbus_get_state(xbb->dev) < XenbusStateClosing)
+		xenbus_set_state(xbb->dev, XenbusStateClosing);
+
+	frontState = xenbus_get_otherend_state(xbb->dev);
+	mtx_lock(&xbb->lock);
+	xbb->flags &= ~XBBF_IN_SHUTDOWN;
+
+	/* The front can submit I/O until entering the closed state. */
+	if (frontState < XenbusStateClosed)
+		return (EAGAIN);
+
 	DPRINTF("\n");
 
 	/* Indicate shutdown is in progress. */
@@ -3434,19 +3516,6 @@ xbb_shutdown(struct xbb_softc *xbb)
 
 	DPRINTF("\n");
 
-	/*
-	 * Before unlocking mutex, set this flag to prevent other threads from
-	 * getting into this function
-	 */
-	xbb->flags |= XBBF_IN_SHUTDOWN;
-	mtx_unlock(&xbb->lock);
-
-	if (xenbus_get_state(xbb->dev) < XenbusStateClosing)
-		xenbus_set_state(xbb->dev, XenbusStateClosing);
-
-	mtx_lock(&xbb->lock);
-	xbb->flags &= ~XBBF_IN_SHUTDOWN;
-
 	/* Indicate to xbb_detach() that is it safe to proceed. */
 	wakeup(xbb);
 
@@ -3573,6 +3642,16 @@ xbb_setup_sysctl(struct xbb_softc *xbb)
 		        "max_request_segments", CTLFLAG_RD,
 		        &xbb->max_request_segments, 0,
 		        "maximum number of pages per requests (negotiated)");
+
+	SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO,
+		        "max_request_size", CTLFLAG_RD,
+		        &xbb->max_request_size, 0,
+		        "maximum size in bytes of a request (negotiated)");
+
+	SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO,
+		        "ring_pages", CTLFLAG_RD,
+		        &xbb->ring_config.ring_pages, 0,
+		        "communication channel pages (negotiated)");
 }
 
 /**
@@ -3587,6 +3666,7 @@ xbb_attach(device_t dev)
 {
 	struct xbb_softc	*xbb;
 	int			 error;
+	u_int			 max_ring_page_order;
 
 	DPRINTF("Attaching to %s\n", xenbus_get_node(dev));
 
@@ -3621,6 +3701,10 @@ xbb_attach(device_t dev)
 		return (error);
 	}
 
+	/*
+	 * Amazon EC2 client compatility.  They refer to max-ring-pages
+	 * instead of to max-ring-page-order.
+	 */
 	error = xs_printf(XST_NIL, xenbus_get_node(xbb->dev),
 			  "max-ring-pages", "%zu", XBB_MAX_RING_PAGES);
 	if (error) {
@@ -3629,6 +3713,15 @@ xbb_attach(device_t dev)
 		return (error);
 	}
 
+	max_ring_page_order = flsl(XBB_MAX_RING_PAGES) - 1;
+	error = xs_printf(XST_NIL, xenbus_get_node(xbb->dev),
+			  "max-ring-page-order", "%u", max_ring_page_order);
+	if (error) {
+		xbb_attach_failed(xbb, error, "writing %s/max-ring-page-order",
+				  xenbus_get_node(xbb->dev));
+		return (error);
+	}
+
 	error = xs_printf(XST_NIL, xenbus_get_node(xbb->dev),
 			  "max-requests", "%u", XBB_MAX_REQUESTS);
 	if (error) {
Modified: stable/8/sys/dev/xen/blkfront/blkfront.c
==============================================================================
--- stable/8/sys/dev/xen/blkfront/blkfront.c	Thu Mar  1 18:45:25 2012	(r232351)
+++ stable/8/sys/dev/xen/blkfront/blkfront.c	Thu Mar  1 19:09:28 2012	(r232352)
@@ -41,6 +41,7 @@ __FBSDID("$FreeBSD$");
 #include <sys/bus.h>
 #include <sys/conf.h>
 #include <sys/module.h>
+#include <sys/sysctl.h>
 
 #include <machine/bus.h>
 #include <sys/rman.h>
@@ -139,7 +140,7 @@ static	int	xb_dump(void *, void *, vm_of
  * with blkfront as the emulated drives, easing transition slightly.
  */
 static void
-blkfront_vdevice_to_unit(int vdevice, int *unit, const char **name)
+blkfront_vdevice_to_unit(uint32_t vdevice, int *unit, const char **name)
 {
 	static struct vdev_info {
 		int major;
@@ -186,6 +187,7 @@ blkfront_vdevice_to_unit(int vdevice, in
 	if (vdevice & (1 << 28)) {
 		*unit = (vdevice & ((1 << 28) - 1)) >> 8;
 		*name = "xbd";
+		return;
 	}
 
 	for (i = 0; info[i].major; i++) {
@@ -407,6 +409,40 @@ blkfront_probe(device_t dev)
 	return (ENXIO);
 }
 
+static void
+xb_setup_sysctl(struct xb_softc *xb)
+{
+	struct sysctl_ctx_list *sysctl_ctx = NULL;
+	struct sysctl_oid      *sysctl_tree = NULL;
+	
+	sysctl_ctx = device_get_sysctl_ctx(xb->xb_dev);
+	if (sysctl_ctx == NULL)
+		return;
+
+	sysctl_tree = device_get_sysctl_tree(xb->xb_dev);
+	if (sysctl_tree == NULL)
+		return;
+
+	SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO,
+		        "max_requests", CTLFLAG_RD, &xb->max_requests, -1,
+		        "maximum outstanding requests (negotiated)");
+
+	SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO,
+		        "max_request_segments", CTLFLAG_RD,
+		        &xb->max_request_segments, 0,
+		        "maximum number of pages per requests (negotiated)");
+
+	SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO,
+		        "max_request_size", CTLFLAG_RD,
+		        &xb->max_request_size, 0,
+		        "maximum size in bytes of a request (negotiated)");
+
+	SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO,
+		        "ring_pages", CTLFLAG_RD,
+		        &xb->ring_pages, 0,
+		        "communication channel pages (negotiated)");
+}
+
 /*
  * Setup supplies the backend dir, virtual device.  We place an event
  * channel and shared frame entries.  We watch backend to wait if it's
@@ -417,14 +453,14 @@ blkfront_attach(device_t dev)
 {
 	struct xb_softc *sc;
 	const char *name;
+	uint32_t vdevice;
 	int error;
-	int vdevice;
 	int i;
 	int unit;
 
 	/* FIXME: Use dynamic device id if this is not set. */
 	error = xs_scanf(XST_NIL, xenbus_get_node(dev),
-	    "virtual-device", NULL, "%i", &vdevice);
+	    "virtual-device", NULL, "%" PRIu32, &vdevice);
 	if (error) {
 		xenbus_dev_fatal(dev, error, "reading virtual-device");
 		device_printf(dev, "Couldn't determine virtual device.\n");
@@ -449,6 +485,8 @@ blkfront_attach(device_t dev)
 	sc->vdevice = vdevice;
 	sc->connected = BLKIF_STATE_DISCONNECTED;
 
+	xb_setup_sysctl(sc);
+
 	/* Wait for backend device to publish its protocol capabilities. */
 	xenbus_set_state(dev, XenbusStateInitialising);
 
@@ -501,6 +539,7 @@ blkfront_initialize(struct xb_softc *sc)
 {
 	const char *otherend_path;
 	const char *node_path;
+	uint32_t max_ring_page_order;
 	int error;
 	int i;
 
@@ -513,10 +552,10 @@ blkfront_initialize(struct xb_softc *sc)
 	 * Protocol defaults valid even if negotiation for a
 	 * setting fails.
 	 */
+	max_ring_page_order = 0;
 	sc->ring_pages = 1;
-	sc->max_requests = BLKIF_MAX_RING_REQUESTS(PAGE_SIZE);
 	sc->max_request_segments = BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK;
-	sc->max_request_size = (sc->max_request_segments - 1) * PAGE_SIZE;
+	sc->max_request_size = XBF_SEGS_TO_SIZE(sc->max_request_segments);
 	sc->max_request_blocks = BLKIF_SEGS_TO_BLOCKS(sc->max_request_segments);
 
 	/*
@@ -526,13 +565,25 @@ blkfront_initialize(struct xb_softc *sc)
 	 *       we must use independant calls in order to guarantee
 	 *       we don't miss information in a sparsly populated back-end
 	 *       tree.
+	 *
+	 * \note xs_scanf() does not update variables for unmatched
+	 *	 fields.
 	 */
 	otherend_path = xenbus_get_otherend_path(sc->xb_dev);
 	node_path = xenbus_get_node(sc->xb_dev);
+
+	/* Support both backend schemes for relaying ring page limits. */
+	(void)xs_scanf(XST_NIL, otherend_path,
+		       "max-ring-page-order", NULL, "%" PRIu32,
+		       &max_ring_page_order);
+	sc->ring_pages = 1 << max_ring_page_order;
 	(void)xs_scanf(XST_NIL, otherend_path,
 		       "max-ring-pages", NULL, "%" PRIu32,
 		       &sc->ring_pages);
+	if (sc->ring_pages < 1)
+		sc->ring_pages = 1;
 
+	sc->max_requests = BLKIF_MAX_RING_REQUESTS(sc->ring_pages * PAGE_SIZE);
 	(void)xs_scanf(XST_NIL, otherend_path,
 		       "max-requests", NULL, "%" PRIu32,
 		       &sc->max_requests);
@@ -552,6 +603,16 @@ blkfront_initialize(struct xb_softc *sc)
 		sc->ring_pages = XBF_MAX_RING_PAGES;
 	}
 
+	if (powerof2(sc->ring_pages) == 0) {
+		uint32_t new_page_limit;
+
+		new_page_limit = 0x01 << (fls(sc->ring_pages) - 1);
+		device_printf(sc->xb_dev, "Back-end specified ring-pages of "
+			      "%u is not a power of 2. Limited to %u.\n",
+			      sc->ring_pages, new_page_limit);
+		sc->ring_pages = new_page_limit;
+	}
+
 	if (sc->max_requests > XBF_MAX_REQUESTS) {
 		device_printf(sc->xb_dev, "Back-end specified max_requests of "
 			      "%u limited to front-end limit of %u.\n",
@@ -560,8 +621,8 @@ blkfront_initialize(struct xb_softc *sc)
 	}
 
 	if (sc->max_request_segments > XBF_MAX_SEGMENTS_PER_REQUEST) {
-		device_printf(sc->xb_dev, "Back-end specificed "
-			      "max_requests_segments of %u limited to "
+		device_printf(sc->xb_dev, "Back-end specified "
+			      "max_request_segments of %u limited to "
 			      "front-end limit of %u.\n",
 			      sc->max_request_segments,
 			      XBF_MAX_SEGMENTS_PER_REQUEST);
@@ -569,12 +630,23 @@ blkfront_initialize(struct xb_softc *sc)
 	}
 
 	if (sc->max_request_size > XBF_MAX_REQUEST_SIZE) {
-		device_printf(sc->xb_dev, "Back-end specificed "
+		device_printf(sc->xb_dev, "Back-end specified "
 			      "max_request_size of %u limited to front-end "
 			      "limit of %u.\n", sc->max_request_size,
 			      XBF_MAX_REQUEST_SIZE);
 		sc->max_request_size = XBF_MAX_REQUEST_SIZE;
 	}
+ 
+ 	if (sc->max_request_size > XBF_SEGS_TO_SIZE(sc->max_request_segments)) {
+ 		device_printf(sc->xb_dev, "Back-end specified "
+ 			      "max_request_size of %u limited to front-end "
+ 			      "limit of %u.  (Too few segments.)\n",
+ 			      sc->max_request_size,
+ 			      XBF_SEGS_TO_SIZE(sc->max_request_segments));
+ 		sc->max_request_size =
+ 		    XBF_SEGS_TO_SIZE(sc->max_request_segments);
+ 	}
+
 	sc->max_request_blocks = BLKIF_SEGS_TO_BLOCKS(sc->max_request_segments);
 
 	/* Allocate datastructures based on negotiated values. */
@@ -625,11 +697,20 @@ blkfront_initialize(struct xb_softc *sc)
 	if (setup_blkring(sc) != 0)
 		return;
 
+	/* Support both backend schemes for relaying ring page limits. */
 	error = xs_printf(XST_NIL, node_path,
-			 "ring-pages","%u", sc->ring_pages);
+			 "num-ring-pages","%u", sc->ring_pages);
 	if (error) {
 		xenbus_dev_fatal(sc->xb_dev, error,
-				 "writing %s/ring-pages",
+				 "writing %s/num-ring-pages",
+				 node_path);
+		return;
+	}
+	error = xs_printf(XST_NIL, node_path,
+			 "ring-page-order","%u", fls(sc->ring_pages) - 1);
+	if (error) {
+		xenbus_dev_fatal(sc->xb_dev, error,
+				 "writing %s/ring-page-order",
 				 node_path);
 		return;
 	}
@@ -711,25 +792,31 @@ setup_blkring(struct xb_softc *sc)
 			return (error);
 		}
 	}
-	error = xs_printf(XST_NIL, xenbus_get_node(sc->xb_dev),
-			  "ring-ref","%u", sc->ring_ref[0]);
-	if (error) {
-		xenbus_dev_fatal(sc->xb_dev, error, "writing %s/ring-ref",
-				 xenbus_get_node(sc->xb_dev));
-		return (error);
-	}
-	for (i = 1; i < sc->ring_pages; i++) {
-		char ring_ref_name[]= "ring_refXX";
-
-		snprintf(ring_ref_name, sizeof(ring_ref_name), "ring-ref%u", i);
+	if (sc->ring_pages == 1) {
 		error = xs_printf(XST_NIL, xenbus_get_node(sc->xb_dev),
-				 ring_ref_name, "%u", sc->ring_ref[i]);
+				  "ring-ref", "%u", sc->ring_ref[0]);
 		if (error) {
-			xenbus_dev_fatal(sc->xb_dev, error, "writing %s/%s",
-					 xenbus_get_node(sc->xb_dev),
-					 ring_ref_name);
+			xenbus_dev_fatal(sc->xb_dev, error,
+					 "writing %s/ring-ref",
+					 xenbus_get_node(sc->xb_dev));
 			return (error);
 		}
+	} else {
+		for (i = 0; i < sc->ring_pages; i++) {
+			char ring_ref_name[]= "ring_refXX";
+
+			snprintf(ring_ref_name, sizeof(ring_ref_name),
+				 "ring-ref%u", i);
+			error = xs_printf(XST_NIL, xenbus_get_node(sc->xb_dev),
+					 ring_ref_name, "%u", sc->ring_ref[i]);
+			if (error) {
+				xenbus_dev_fatal(sc->xb_dev, error,
+						 "writing %s/%s",
+						 xenbus_get_node(sc->xb_dev),
+						 ring_ref_name);
+				return (error);
+			}
+		}
 	}
 
 	error = bind_listening_port_to_irqhandler(
@@ -795,7 +882,7 @@ blkfront_connect(struct xb_softc *sc)
 	unsigned int binfo;
 	int err, feature_barrier;
 
-        if( (sc->connected == BLKIF_STATE_CONNECTED) || 
+	if( (sc->connected == BLKIF_STATE_CONNECTED) || 
 	    (sc->connected == BLKIF_STATE_SUSPENDED) )
 		return;
 
@@ -923,15 +1010,13 @@ blkif_close(struct disk *dp)
 		return (ENXIO);
 	sc->xb_flags &= ~XB_OPEN;
 	if (--(sc->users) == 0) {
-		/* Check whether we have been instructed to close.  We will
-		   have ignored this request initially, as the device was
-		   still mounted. */
-		device_t dev = sc->xb_dev;
-		XenbusState state =
-			xenbus_read_driver_state(xenbus_get_otherend_path(dev));
-
-		if (state == XenbusStateClosing)
-			blkfront_closing(dev);
+		/*
+		 * Check whether we have been instructed to close.  We will
+		 * have ignored this request initially, as the device was
+		 * still mounted.
+		 */
+		if (xenbus_get_otherend_state(sc->xb_dev) == XenbusStateClosing)
+			blkfront_closing(sc->xb_dev);
 	}
 	return (0);
 }
@@ -1033,7 +1118,7 @@ blkif_queue_cb(void *arg, bus_dma_segmen
 	struct xb_command *cm;
 	blkif_request_t	*ring_req;
 	struct blkif_request_segment *sg;
-        struct blkif_request_segment *last_block_sg;
+	struct blkif_request_segment *last_block_sg;
 	grant_ref_t *sg_ref;
 	vm_paddr_t buffer_ma;
 	uint64_t fsect, lsect;
@@ -1104,12 +1189,12 @@ blkif_queue_cb(void *arg, bus_dma_segmen
 			nsegs--;
 		}
 		block_segs = MIN(nsegs, BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK);
-                if (block_segs == 0)
-                        break;
+		if (block_segs == 0)
+			break;
 
-                sg = BLKRING_GET_SG_REQUEST(&sc->ring, sc->ring.req_prod_pvt);
+		sg = BLKRING_GET_SEG_BLOCK(&sc->ring, sc->ring.req_prod_pvt);
 		sc->ring.req_prod_pvt++;
-                last_block_sg = sg + block_segs;
+		last_block_sg = sg + block_segs;
 	}
 
 	if (cm->operation == BLKIF_OP_READ)
Modified: stable/8/sys/dev/xen/blkfront/block.h
==============================================================================
--- stable/8/sys/dev/xen/blkfront/block.h	Thu Mar  1 18:45:25 2012	(r232351)
+++ stable/8/sys/dev/xen/blkfront/block.h	Thu Mar  1 19:09:28 2012	(r232352)
@@ -35,6 +35,32 @@
 #include <xen/blkif.h>
 
 /**
+ * Given a number of blkif segments, compute the maximum I/O size supported.
+ *
+ * \note This calculation assumes that all but the first and last segments 
+ *       of the I/O are fully utilized.
+ *
+ * \note We reserve a segement from the maximum supported by the transport to
+ *       guarantee we can handle an unaligned transfer without the need to
+ *       use a bounce buffer.
+ */
+#define	XBF_SEGS_TO_SIZE(segs)						\
+	(((segs) - 1) * PAGE_SIZE)
+
+/**
+ * Compute the maximum number of blkif segments requried to represent
+ * an I/O of the given size.
+ *
+ * \note This calculation assumes that all but the first and last segments
+ *       of the I/O are fully utilized.
+ *
+ * \note We reserve a segement to guarantee we can handle an unaligned
+ *       transfer without the need to use a bounce buffer.
+ */
+#define	XBF_SIZE_TO_SEGS(size)						\
+	((size / PAGE_SIZE) + 1)
+
+/**
  * The maximum number of outstanding requests blocks (request headers plus
  * additional segment blocks) we will allow in a negotiated block-front/back
  * communication channel.
@@ -44,22 +70,18 @@
 /**
  * The maximum mapped region size per request we will allow in a negotiated
  * block-front/back communication channel.
- *
- * \note We reserve a segement from the maximum supported by the transport to
- *       guarantee we can handle an unaligned transfer without the need to
- *       use a bounce buffer..
  */
-#define	XBF_MAX_REQUEST_SIZE		\
-	MIN(MAXPHYS, (BLKIF_MAX_SEGMENTS_PER_REQUEST - 1) * PAGE_SIZE)
+#define	XBF_MAX_REQUEST_SIZE						\
+	MIN(MAXPHYS, XBF_SEGS_TO_SIZE(BLKIF_MAX_SEGMENTS_PER_REQUEST))
 
 /**
  * The maximum number of segments (within a request header and accompanying
  * segment blocks) per request we will allow in a negotiated block-front/back
  * communication channel.
  */
-#define	XBF_MAX_SEGMENTS_PER_REQUEST		\
-	(MIN(BLKIF_MAX_SEGMENTS_PER_REQUEST,	\
-	     (XBF_MAX_REQUEST_SIZE / PAGE_SIZE) + 1))
+#define	XBF_MAX_SEGMENTS_PER_REQUEST					\
+	(MIN(BLKIF_MAX_SEGMENTS_PER_REQUEST,				\
+	     XBF_SIZE_TO_SEGS(XBF_MAX_REQUEST_SIZE)))
 
 /**
  * The maximum number of shared memory ring pages we will allow in a
Modified: stable/8/sys/xen/interface/io/blkif.h
==============================================================================
--- stable/8/sys/xen/interface/io/blkif.h	Thu Mar  1 18:45:25 2012	(r232351)
+++ stable/8/sys/xen/interface/io/blkif.h	Thu Mar  1 19:09:28 2012	(r232352)
@@ -1,8 +1,8 @@
 /******************************************************************************
  * blkif.h
- * 
+ *
  * Unified block-device I/O interface for Xen guest OSes.
- * 
+ *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to
  * deal in the Software without restriction, including without limitation the
@@ -22,6 +22,7 @@
  * DEALINGS IN THE SOFTWARE.
  *
  * Copyright (c) 2003-2004, Keir Fraser
+ * Copyright (c) 2012, Spectra Logic Corporation
  */
 
 #ifndef __XEN_PUBLIC_IO_BLKIF_H__
@@ -35,7 +36,7 @@
  * notification can be made conditional on req_event (i.e., the generic
  * hold-off mechanism provided by the ring macros). Backends must set
  * req_event appropriately (e.g., using RING_FINAL_CHECK_FOR_REQUESTS()).
- * 
+ *
  * Back->front notifications: When enqueuing a new response, sending a
  * notification can be made conditional on rsp_event (i.e., the generic
  * hold-off mechanism provided by the ring macros). Frontends must set
@@ -48,37 +49,417 @@
 #define blkif_sector_t uint64_t
 
 /*
+ * Feature and Parameter Negotiation
+ * =================================
+ * The two halves of a Xen block driver utilize nodes within the XenStore to
+ * communicate capabilities and to negotiate operating parameters.  This
+ * section enumerates these nodes which reside in the respective front and
+ * backend portions of the XenStore, following the XenBus convention.
+ *
+ * All data in the XenStore is stored as strings.  Nodes specifying numeric
+ * values are encoded in decimal.  Integer value ranges listed below are
+ * expressed as fixed sized integer types capable of storing the conversion
+ * of a properly formatted node string, without loss of information.
+ *
+ * Any specified default value is in effect if the corresponding XenBus node
+ * is not present in the XenStore.
+ *
+ * XenStore nodes in sections marked "PRIVATE" are solely for use by the
+ * driver side whose XenBus tree contains them.
+ *
+ * XenStore nodes marked "DEPRECATED" in their notes section should only be
+ * used to provide interoperability with legacy implementations.
+ *
+ * See the XenBus state transition diagram below for details on when XenBus
+ * nodes must be published and when they can be queried.
+ *
+ *****************************************************************************
+ *                            Backend XenBus Nodes
+ *****************************************************************************
+ *
+ *------------------ Backend Device Identification (PRIVATE) ------------------
+ *
+ * mode
+ *      Values:         "r" (read only), "w" (writable)
+ *
+ *      The read or write access permissions to the backing store to be
+ *      granted to the frontend.
+ *
+ * params
+ *      Values:         string
+ *
+ *      Data used by the backend driver to locate and configure the backing
+ *      device.  The format and semantics of this data vary according to the
+ *      backing device in use and are outside the scope of this specification.
+ *
+ * type
+ *      Values:         "file", "phy", "tap"
+ *
+ *      The type of the backing device/object.
+ *
+ *--------------------------------- Features ---------------------------------
+ *
+ * feature-barrier
+ *      Values:         0/1 (boolean)
+ *      Default Value:  0
+ *
+ *      A value of "1" indicates that the backend can process requests
+ *      containing the BLKIF_OP_WRITE_BARRIER request opcode.  Requests
+ *      of this type may still be returned at any time with the
+ *      BLKIF_RSP_EOPNOTSUPP result code.
+ *
+ * feature-flush-cache
+ *      Values:         0/1 (boolean)
+ *      Default Value:  0
+ *
+ *      A value of "1" indicates that the backend can process requests
+ *      containing the BLKIF_OP_FLUSH_DISKCACHE request opcode.  Requests
+ *      of this type may still be returned at any time with the
+ *      BLKIF_RSP_EOPNOTSUPP result code.
+ *
+ * feature-discard
+ *      Values:         0/1 (boolean)
+ *      Default Value:  0
+ *
+ *      A value of "1" indicates that the backend can process requests
+ *      containing the BLKIF_OP_DISCARD request opcode.  Requests
+ *      of this type may still be returned at any time with the
+ *      BLKIF_RSP_EOPNOTSUPP result code.
+ *
+ *----------------------- Request Transport Parameters ------------------------
+ *
+ * max-ring-page-order
+ *      Values:         <uint32_t>
+ *      Default Value:  0
+ *      Notes:          1, 3
+ *
+ *      The maximum supported size of the request ring buffer in units of
+ *      lb(machine pages). (e.g. 0 == 1 page,  1 = 2 pages, 2 == 4 pages,
+ *      etc.).
+ *
+ * max-ring-pages
+ *      Values:         <uint32_t>
+ *      Default Value:  1
+ *      Notes:          DEPRECATED, 2, 3
+ *
+ *      The maximum supported size of the request ring buffer in units of
+ *      machine pages.  The value must be a power of 2.
+ *
+ * max-requests         <uint32_t>
+ *      Default Value:  BLKIF_MAX_RING_REQUESTS(PAGE_SIZE)
+ *      Maximum Value:  BLKIF_MAX_RING_REQUESTS(PAGE_SIZE * max-ring-pages)
+ *
+ *      The maximum number of concurrent, logical requests supported by
+ *      the backend.
+ *
+ *      Note: A logical request may span multiple ring entries.
+ *
+ * max-request-segments
+ *      Values:         <uint8_t>
+ *      Default Value:  BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK
+ *      Maximum Value:  BLKIF_MAX_SEGMENTS_PER_REQUEST
+ *
+ *      The maximum value of blkif_request.nr_segments supported by
+ *      the backend.
+ *
+ * max-request-size
+ *      Values:         <uint32_t>
+ *      Default Value:  BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK * PAGE_SIZE
+ *      Maximum Value:  BLKIF_MAX_SEGMENTS_PER_REQUEST * PAGE_SIZE
+ *
+ *      The maximum amount of data, in bytes, that can be referenced by a
+ *      request type that accesses frontend memory (currently BLKIF_OP_READ,
+ *      BLKIF_OP_WRITE, or BLKIF_OP_WRITE_BARRIER).
+ *
+ *------------------------- Backend Device Properties -------------------------
+ *
*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
    
    
More information about the svn-src-all
mailing list