Unmapped I/O

Konstantin Belousov kostikbel at gmail.com
Wed Dec 19 13:55:03 UTC 2012

One of the known FreeBSD I/O path performance bootleneck is the
neccessity to map each I/O buffer pages into KVA.  The problem is that
on the multi-core machines, the mapping must flush TLB on all cores,
due to the global mapping of the buffer pages into the kernel.  This
means that buffer creation and destruction disrupts execution of all
other cores to perform TLB shootdown through IPI, and the thread
initiating the shootdown must wait for all other cores to execute and

The patch at
implements the 'unmapped buffers'.  It means an ability to create the
VMIO struct buf, which does not point to the KVA mapping the buffer
pages to the kernel addresses.  Since there is no mapping, kernel does
not need to clear TLB. The unmapped buffers are marked with the new
B_NOTMAPPED flag, and should be requested explicitely using the
GB_NOTMAPPED flag to the buffer allocation routines.  If the mapped
buffer is requested but unmapped buffer already exists, the buffer
subsystem automatically maps the pages.

The clustering code is also made aware of the not-mapped buffers, but
this required the KPI change that accounts for the diff in the non-UFS

UFS is adopted to request not mapped buffers when kernel does not need
to access the content, i.e. mostly for the file data.  New helper
function vn_io_fault_pgmove() operates on the unmapped array of pages.
It calls new pmap method pmap_copy_pages() to do the data move to and
from usermode.

Besides not mapped buffers, not mapped BIOs are introduced, marked
with the flag BIO_NOTMAPPED.  Unmapped buffers are directly translated
to unmapped BIOs.  Geom providers may indicate an acceptance of the
unmapped BIOs.  If provider does not handle unmapped i/o requests,
geom now automatically establishes transient mapping for the i/o

Swap- and malloc-backed md(4) is changed to accept unmapped BIOs. The
gpart providers indicate the unmapped BIOs support if the underlying
provider can do unmapped i/o.  I also hacked ahci(4) to handle
unmapped i/o, but this should be changed after the Jeff' physbio patch
is committed, to use proper busdma interface.

Besides, the swap pager does unmapped swapping if the swap partition
indicated that it can do unmapped i/o.  By Jeff request, a buffer
allocation code may reserve the KVA for unmapped buffer in advance.
The unmapped page-in for the vnode pager is also implemented if
filesystem supports it, but the page out is not. The page-out, as well
as the vnode-backed md(4), currently require mappings, mostly due to
the use of VOP_WRITE().

As such, the patch worked in my test environment, where I used
ahci-attached SATA disks with gpt partitions, md(4) and UFS.  I see no
statistically significant difference in the buildworld -j 10 times on
the 4-core machine with HT.  On the other hand, when doing sha1 over
the 5GB file, the system time was reduced by 30%.

Unfinished items:
- Integration with the physbio, will be done after physbio is
  committed to HEAD.
- The key per-architecture function needed for the unmapped i/o is the
  pmap_copy_pages(). I implemented it for amd64 and i386 right now, it
  shall be done for all other architectures.
- The sizing of the submap used for transient mapping of the BIOs is
  naive.  Should be adjusted, esp. for KVA-lean architectures.
- Conversion of the other filesystems. Low priority.

I am interested in reviews, tests and suggestions.  Note that this
only works now for md(4) and ahci(4), for other drivers the patched
kernel should fall back to the mapped i/o.

 sys/amd64/amd64/pmap.c         |  24 +++
 sys/cam/ata/ata_da.c           |   5 +-
 sys/cam/cam_ccb.h              |  30 ++++
 sys/dev/ahci/ahci.c            |  53 +++++-
 sys/dev/md/md.c                | 255 ++++++++++++++++++++++++-----
 sys/fs/cd9660/cd9660_vnops.c   |   2 +-
 sys/fs/ext2fs/ext2_balloc.c    |   2 +-
 sys/fs/ext2fs/ext2_vnops.c     |   9 +-
 sys/fs/msdosfs/msdosfs_vnops.c |   4 +-
 sys/fs/udf/udf_vnops.c         |   5 +-
 sys/geom/geom.h                |   1 +
 sys/geom/geom_disk.c           |   2 +
 sys/geom/geom_disk.h           |   1 +
 sys/geom/geom_io.c             |  44 ++++-
 sys/geom/geom_vfs.c            |  10 +-
 sys/geom/part/g_part.c         |   1 +
 sys/i386/i386/pmap.c           |  42 +++++
 sys/kern/vfs_bio.c             | 356 +++++++++++++++++++++++++++++++++--------
 sys/kern/vfs_cluster.c         | 118 +++++++-------
 sys/kern/vfs_vnops.c           |  39 +++++
 sys/sys/bio.h                  |   7 +
 sys/sys/buf.h                  |  22 ++-
 sys/sys/mount.h                |   1 +
 sys/sys/vnode.h                |   2 +
 sys/ufs/ffs/ffs_alloc.c        |  10 +-
 sys/ufs/ffs/ffs_balloc.c       |  58 ++++---
 sys/ufs/ffs/ffs_vfsops.c       |   3 +-
 sys/ufs/ffs/ffs_vnops.c        |  35 ++--
 sys/ufs/ufs/ufs_extern.h       |   1 +
 sys/vm/pmap.h                  |   2 +
 sys/vm/swap_pager.c            |  43 +++--
 sys/vm/swap_pager.h            |   1 +
 sys/vm/vm.h                    |   2 +
 sys/vm/vm_init.c               |   6 +-
 sys/vm/vm_kern.c               |   9 +-
 sys/vm/vnode_pager.c           |  30 +++-
 36 files changed, 989 insertions(+), 246 deletions(-)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20121219/a2326505/attachment.sig>

More information about the freebsd-arch mailing list