svn commit: r290198 - head/sys/dev/nvme

Jim Harris jimharris at FreeBSD.org
Fri Oct 30 16:06:35 UTC 2015


Author: jimharris
Date: Fri Oct 30 16:06:34 2015
New Revision: 290198
URL: https://svnweb.freebsd.org/changeset/base/290198

Log:
  nvme: fix race condition in split bio completion path
  
  Fixes race condition observed under following circumstances:
  
  1) I/O split on 128KB boundary with Intel NVMe controller.
     Current Intel controllers produce better latency when
     I/Os do not span a 128KB boundary - even if the I/O size
     itself is less than 128KB.
  2) Per-CPU I/O queues are enabled.
  3) Child I/Os are submitted on different submission queues.
  4) Interrupts for child I/O completions occur almost
     simultaneously.
  5) ithread for child I/O A increments bio_inbed, then
     immediately is preempted (rendezvous IPI, higher priority
     interrupt).
  6) ithread for child I/O B increments bio_inbed, then completes
     parent bio since all children are now completed.
  7) parent bio is freed, and immediately reallocated for a VFS
     or gpart bio (including setting bio_children to 1 and
     clearing bio_driver1).
  8) ithread for child I/O A resumes processing.  bio_children
     for what it thinks is the parent bio is set to 1, so it
     thinks it needs to complete the parent bio.
  
  Result is either calling a NULL callback function, or double freeing
  the bio to its uma zone.
  
  PR:		203746
  Reported by:	Drew Gallatin <gallatin at netflix.com>,
  		Marc Goroff <mgoroff at quorum.net>
  Tested by:	Drew Gallatin <gallatin at netflix.com>
  MFC after:	3 days
  Sponsored by:	Intel

Modified:
  head/sys/dev/nvme/nvme_ns.c

Modified: head/sys/dev/nvme/nvme_ns.c
==============================================================================
--- head/sys/dev/nvme/nvme_ns.c	Fri Oct 30 15:52:10 2015	(r290197)
+++ head/sys/dev/nvme/nvme_ns.c	Fri Oct 30 16:06:34 2015	(r290198)
@@ -239,7 +239,7 @@ static void
 nvme_bio_child_inbed(struct bio *parent, int bio_error)
 {
 	struct nvme_completion	parent_cpl;
-	int			inbed;
+	int			children, inbed;
 
 	if (bio_error != 0) {
 		parent->bio_flags |= BIO_ERROR;
@@ -248,10 +248,13 @@ nvme_bio_child_inbed(struct bio *parent,
 
 	/*
 	 * atomic_fetchadd will return value before adding 1, so we still
-	 *  must add 1 to get the updated inbed number.
+	 *  must add 1 to get the updated inbed number.  Save bio_children
+	 *  before incrementing to guard against race conditions when
+	 *  two children bios complete on different queues.
 	 */
+	children = atomic_load_acq_int(&parent->bio_children);
 	inbed = atomic_fetchadd_int(&parent->bio_inbed, 1) + 1;
-	if (inbed == parent->bio_children) {
+	if (inbed == children) {
 		bzero(&parent_cpl, sizeof(parent_cpl));
 		if (parent->bio_flags & BIO_ERROR)
 			parent_cpl.status.sc = NVME_SC_DATA_TRANSFER_ERROR;


More information about the svn-src-head mailing list