svn commit: r290198 - head/sys/dev/nvme
Jim Harris
jimharris at FreeBSD.org
Fri Oct 30 16:06:35 UTC 2015
Author: jimharris
Date: Fri Oct 30 16:06:34 2015
New Revision: 290198
URL: https://svnweb.freebsd.org/changeset/base/290198
Log:
nvme: fix race condition in split bio completion path
Fixes race condition observed under following circumstances:
1) I/O split on 128KB boundary with Intel NVMe controller.
Current Intel controllers produce better latency when
I/Os do not span a 128KB boundary - even if the I/O size
itself is less than 128KB.
2) Per-CPU I/O queues are enabled.
3) Child I/Os are submitted on different submission queues.
4) Interrupts for child I/O completions occur almost
simultaneously.
5) ithread for child I/O A increments bio_inbed, then
immediately is preempted (rendezvous IPI, higher priority
interrupt).
6) ithread for child I/O B increments bio_inbed, then completes
parent bio since all children are now completed.
7) parent bio is freed, and immediately reallocated for a VFS
or gpart bio (including setting bio_children to 1 and
clearing bio_driver1).
8) ithread for child I/O A resumes processing. bio_children
for what it thinks is the parent bio is set to 1, so it
thinks it needs to complete the parent bio.
Result is either calling a NULL callback function, or double freeing
the bio to its uma zone.
PR: 203746
Reported by: Drew Gallatin <gallatin at netflix.com>,
Marc Goroff <mgoroff at quorum.net>
Tested by: Drew Gallatin <gallatin at netflix.com>
MFC after: 3 days
Sponsored by: Intel
Modified:
head/sys/dev/nvme/nvme_ns.c
Modified: head/sys/dev/nvme/nvme_ns.c
==============================================================================
--- head/sys/dev/nvme/nvme_ns.c Fri Oct 30 15:52:10 2015 (r290197)
+++ head/sys/dev/nvme/nvme_ns.c Fri Oct 30 16:06:34 2015 (r290198)
@@ -239,7 +239,7 @@ static void
nvme_bio_child_inbed(struct bio *parent, int bio_error)
{
struct nvme_completion parent_cpl;
- int inbed;
+ int children, inbed;
if (bio_error != 0) {
parent->bio_flags |= BIO_ERROR;
@@ -248,10 +248,13 @@ nvme_bio_child_inbed(struct bio *parent,
/*
* atomic_fetchadd will return value before adding 1, so we still
- * must add 1 to get the updated inbed number.
+ * must add 1 to get the updated inbed number. Save bio_children
+ * before incrementing to guard against race conditions when
+ * two children bios complete on different queues.
*/
+ children = atomic_load_acq_int(&parent->bio_children);
inbed = atomic_fetchadd_int(&parent->bio_inbed, 1) + 1;
- if (inbed == parent->bio_children) {
+ if (inbed == children) {
bzero(&parent_cpl, sizeof(parent_cpl));
if (parent->bio_flags & BIO_ERROR)
parent_cpl.status.sc = NVME_SC_DATA_TRANSFER_ERROR;
More information about the svn-src-head
mailing list