mfi(4) IO performance regression, post 8.1

John Baldwin jhb at freebsd.org
Fri Jun 22 14:24:24 UTC 2012


On Thursday, June 21, 2012 10:36:04 pm Charles Owens wrote:
> 
> On 6/15/12 8:04 AM, John Baldwin wrote:
> > On Friday, June 15, 2012 12:28:59 am Charles Owens wrote:
> >> Hello FreeBSD folk,
> >>
> >> We're seeing what appears to be a storage performance regression as we
> >> try to move from 8.1 (i386) to 8.3.   We looked at 8.2 also and it
> >> appears that the regression happened between 8.1 and 8.2.
> >>
> >> Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core CPUs.
> >> Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10
> >> configuration, using UFS + geom_journal for filesystem.
> >>
> >> Postgresql performance, as seen via pgbench, dropped by approx 20%.
> >> This testing was done with our usual PAE-enabled kernels.  We then went
> >> back to GENERIC kernels and did comparisons using "bonnie", results
> >> below.  Following that is a kernel boot log.
> >>
> >> Notably, we're seeing this regression only with our RAID mfi(4) based
> >> systems.  Notably, from looking at FreeBSD source changelogs it appears
> >> that the mfi(4) code has seen some changes since 8.1.
> > Between 8.1 and 8.2 mfi has not had any significant changes.  The only changes
> > made to sys/dev/mfi were to add a new constant:
> >
> >> svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi
> > svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi
> > Index: mfireg.h
> > ===================================================================
> > --- mfireg.h    (.../8.1/sys/dev/mfi)   (revision 237134)
> > +++ mfireg.h    (.../8.2/sys/dev/mfi)   (revision 237134)
> > @@ -975,7 +975,9 @@
> >          MFI_PD_STATE_OFFLINE = 0x10,
> >          MFI_PD_STATE_FAILED = 0x11,
> >          MFI_PD_STATE_REBUILD = 0x14,
> > -       MFI_PD_STATE_ONLINE = 0x18
> > +       MFI_PD_STATE_ONLINE = 0x18,
> > +       MFI_PD_STATE_COPYBACK = 0x20,
> > +       MFI_PD_STATE_SYSTEM = 0x40
> >   };
> >   
> >   union mfi_ld_ref {
> >
> > The difference in write performance must be due to something else.  You
> > mentioned you are using UFS + gjournal.  I think gjournal uses BIO_FLUSH, so I
> > wonder if this is related:
> >
> > ------------------------------------------------------------------------
> > r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61 lines
> >
> > MFC 212160:
> >
> > Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic.
> > Add the BIO_ORDERED flag for struct bio and update bio clients to use it.
> >
> > The barrier semantics of bioq_insert_tail() were broken in two ways:
> >
> >   o In bioq_disksort(), an added bio could be inserted at the head of
> >     the queue, even when a barrier was present, if the sort key for
> >     the new entry was less than that of the last queued barrier bio.
> >
> >   o The last_offset used to generate the sort key for newly queued bios
> >     did not stay at the position of the barrier until either the
> >     barrier was de-queued, or a new barrier (which updates last_offset)
> >     was queued.  When a barrier is in effect, we know that the disk
> >     will pass through the barrier position just before the
> >     "blocked bios" are released, so using the barrier's offset for
> >     last_offset is the optimal choice.
> >
> > sys/geom/sched/subr_disk.c:
> > sys/kern/subr_disk.c:
> >          o Update last_offset in bioq_insert_tail().
> >
> >          o Only update last_offset in bioq_remove() if the removed bio is
> >            at the head of the queue (typically due to a call via
> >            bioq_takefirst()) and no barrier is active.
> >
> >          o In bioq_disksort(), if we have a barrier (insert_point is non-NULL),
> >            set prev to the barrier and cur to it's next element.  Now that
> >            last_offset is kept at the barrier position, this change isn't
> >            strictly necessary, but since we have to take a decision branch
> >            anyway, it does avoid one, no-op, loop iteration in the while
> >            loop that immediately follows.
> >
> >          o In bioq_disksort(), bypass the normal sort for bios with the
> >            BIO_ORDERED attribute and instead insert them into the queue
> >            with bioq_insert_tail().  bioq_insert_tail() not only gives
> >            the desired command order during insertion, but also provides
> >            barrier semantics so that commands disksorted in the future
> >            cannot pass the just enqueued transaction.
> >
> > sys/sys/bio.h:
> >          Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.
> >
> > sys/cam/ata/ata_da.c:
> > sys/cam/scsi/scsi_da.c
> >          Use an ordered command for SCSI/ATA-NCQ commands issued in
> >          response to bios with the BIO_ORDERED flag set.
> >
> > sys/cam/scsi/scsi_da.c
> >          Use an ordered tag when issuing a synchronize cache command.
> >
> >          Wrap some lines to 80 columns.
> >
> > sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> > sys/geom/geom_io.c
> >          Mark bios with the BIO_FLUSH command as BIO_ORDERED.
> >
> > Sponsored by:   Spectra Logic Corporation
> > ------------------------------------------------------------------------
> >
> > Can you try perhaps commenting out the 'bp->bio_flags |= BIO_ORDERED' line
> > changed in geom_io.c in 8.2?  That would be effectively reverting this
> > portion of the diff:
> >
> > Index: geom_io.c
> > ===================================================================
> > --- geom_io.c   (.../8.1/sys/geom)      (revision 237134)
> > +++ geom_io.c   (.../8.2/sys/geom)      (revision 237134)
> > @@ -265,6 +265,7 @@
> >          g_trace(G_T_BIO, "bio_flush(%s)", cp->provider->name);
> >          bp = g_alloc_bio();
> >          bp->bio_cmd = BIO_FLUSH;
> > +       bp->bio_flags |= BIO_ORDERED;
> >          bp->bio_done = NULL;
> >          bp->bio_attribute = NULL;
> >          bp->bio_offset = cp->provider->mediasize;
> >
> 
> John... thanks for the suggestion.  I've built and tested a kernel with 
> this change made.  Result:  no change (same performance as with 
> 8.2-GENERIC). Any thoughts as to where to go next?

Hmm.  That seemed the most plausible candidate when I looked at this.

Do you use quotas (there is one change in UFS related to quotas)?

There are 5 changes that involve sys/kern/vfs_bio.c in 8.2:
209459, 212229, 212562, 212583, and 213890.

Can you possibly test out kernels from stable/8 at those revisions on an 8.1
world and see if you can narrow it down futher?

Barring that, can you do a binary search of kernels from stable/8 between 8.1
and 8.2 on an 8.1 world to see which commit caused the change in write
performance?

-- 
John Baldwin


More information about the freebsd-stable mailing list