[Bug 230260] [FUSEFS] [PERFORMANCE]: Performance issue (I/O block size)

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Fri Sep 6 17:56:48 UTC 2019


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230260

--- Comment #17 from commit-hook at freebsd.org ---
A commit references this bug:

Author: asomers
Date: Fri Sep  6 17:56:27 UTC 2019
New revision: 351943
URL: https://svnweb.freebsd.org/changeset/base/351943

Log:
  MFC r344183-r344187, r344333-r344334, r344407, r344857, r344865 (by cem)

  r344183:
  FUSE: Respect userspace FS "do-not-cache" of file attributes

  The FUSE protocol demands that kernel implementations cache user filesystem
  file attributes (vattr data) for a maximum period of time in the range of
  [0, ULONG_MAX] seconds.  In practice, typical requests are for 0, 1, or 10
  seconds; or "a long time" to represent indefinite caching.

  Historically, FreeBSD FUSE has ignored this client directive entirely.  This
  works fine for local-only filesystems, but causes consistency issues with
  multi-writer network filesystems.

  For now, respect 0 second cache TTLs and do not cache such metadata.
  Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds
  are still cached indefinitely, because it is unclear how a userspace
  filesystem could do anything sensible with those semantics even if
  implemented.

  In the future, as an optimization, we should implement notify_inval_entry,
  etc, which provide userspace filesystems a way of evicting the kernel cache.

  One potentially bogus access to invalid cached attribute data was left in
  fuse_io_strategy.  It is restricted behind the undocumented and non-default
  "vfs.fuse.fix_broken_io" sysctl or "brokenio" mount option; maybe these are
  deadcode and can be eliminated?

  Some minor APIs changed to facilitate this:

  1. Attribute cache validity is tracked in FUSE inodes ("fuse_vnode_data").

  2. cache_attrs() respects the provided TTL and only caches in the FUSE
  inode if TTL > 0.  It also grows an "out" argument, which, if non-NULL,
  stores the translated fuse_attr (even if not suitable for caching).

  3. FUSE VTOVA(vp) returns NULL if the vnode's cache is invalid, to help
  avoid programming mistakes.

  4. A VOP_LINK check for potential nlink overflow prior to invoking the FUSE
  link op was weakened (only performed when we have a valid attr cache).  The
  check is racy in a multi-writer network filesystem anyway -- classic TOCTOU.
  We have to trust any userspace filesystem that rejects local caching to
  account for it correctly.

  PR:           230258 (inspired by; does not fix)

  r344184:
  FUSE: Respect userspace FS "do-not-cache" of path components

  The FUSE protocol demands that kernel implementations cache user filesystem
  path components (lookup/cnp data) for a maximum period of time in the range
  of [0, ULONG_MAX] seconds.  In practice, typical requests are for 0, 1, or
  10 seconds; or "a long time" to represent indefinite caching.

  Historically, FreeBSD FUSE has ignored this client directive entirely.  This
  works fine for local-only filesystems, but causes consistency issues with
  multi-writer network filesystems.

  For now, respect 0 second cache TTLs and do not cache such metadata.
  Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds
  are still cached indefinitely, because it is unclear how a userspace
  filesystem could do anything sensible with those semantics even if
  implemented.

  Pass fuse_entry_out to fuse_vnode_get when available and only cache lookup
  if the user filesystem did not set a zero second TTL.

  PR:           230258 (inspired by; does not fix)

  r344185:
  FUSE: Only "dirty" cached file size when data is dirty

  Most users of fuse_vnode_setsize() set the cached fvdat->filesize and update
  the buf cache bounds as a result of either a read from the underlying FUSE
  filesystem, or as part of a write-through type operation (like truncate =>
  VOP_SETATTR).  In these cases, do not set the FN_SIZECHANGE flag, which
  indicates that an inode's data is dirty (in particular, that the local buf
  cache and fvdat->filesize have dirty extended data).

  PR:           230258 (related)

  r344186:
  FUSE: The FUSE design expects writethrough caching

  At least prior to 7.23 (which adds FUSE_WRITEBACK_CACHE), the FUSE protocol
  specifies only clean data to be cached.

  Prior to this change, we implement and default to writeback caching.  This
  is ok enough for local only filesystems without hardlinks, but violates the
  general design contract with FUSE and breaks distributed filesystems or
  concurrent access to hardlinks of the same inode.

  In this change, add cache mode as an extension of cache enable/disable.  The
  new modes are UC (was: cache disabled), WT (default), and WB (was: cache
  enabled).

  For now, WT caching is implemented as write-around, which meets the goal of
  only caching clean data.  WT can be better than WA for workloads that
  frequently read data that was recently written, but WA is trivial to
  implement.  Note that this has no effect on O_WRONLY-opened files, which
  were already coerced to write-around.

  Refs:
    * https://sourceforge.net/p/fuse/mailman/message/8902254/
    * https://github.com/vgough/encfs/issues/315

  PR:           230258 (inspired by)

  r344187:
  FUSE: Refresh cached file size when it changes (lookup)

  The cached fvdat->filesize is indepedent of the (mostly unused)
  cached_attrs, and we failed to update it when a cached (but perhaps
  inactive) vnode was found during VOP_LOOKUP to have a different size than
  cached.

  As noted in the code comment, this can occur in distributed filesystems or
  with other kinds of irregular file behavior (anything is possible in FUSE).

  We do something similar in fuse_vnop_getattr already.

  PR:           230258 (as reported in description; other issues explored in
                        comments are not all resolved)
  Reported by:  MooseFS FreeBSD Team <freebsd AT moosefs.com>
  Submitted by: Jakub Kruszona-Zawadzki <acid AT moosefs.com> (earlier version)

  r344333:
  fuse: add descriptions for remaining sysctls

  (Except reclaim revoked; I don't know what that goal of that one is.)

  r344334:
  Fuse: whitespace and style(9) cleanup

  Take a pass through fixing some of the most egregious whitespace issues in
  fs/fuse.  Also fix some style(9) warts while here.  Not 100% cleaned up, but
  somewhat less painful to look at and edit.

  No functional change.

  r344407:
  fuse: Fix a regression introduced in r337165

  On systems with non-default DFLTPHYS and/or MAXBSIZE, FUSE would attempt to
  use a buf cache block size in excess of permitted size.  This did not affect
  most configurations, since DFLTPHYS and MAXBSIZE both default to 64kB.
  The issue was discovered and reported using a custom kernel with a DFLTPHYS
  of 512kB.

  PR:           230260 (comment #9)
  Reported by:  ken@

  r344857:
  FUSE: Prevent trivial panic

  When open(2) was invoked against a FUSE filesystem with an unexpected flags
  value (no O_RDONLY / O_RDWR / O_WRONLY), an assertion fired, causing panic.

  For now, prevent the panic by rejecting such VOP_OPENs with EINVAL.

  This is not considered the correct long term fix, but does prevent an
  unprivileged denial-of-service.

  PR:           236329
  Reported by:  asomers
  Reviewed by:  asomers
  Sponsored by: Dell EMC Isilon

  r344865:
  fuse: switch from DFLTPHYS/MAXBSIZE to maxcachebuf

  On GENERIC kernels with empty loader.conf, there is no functional change.
  DFLTPHYS and MAXBSIZE are both 64kB at the moment.  This change allows
  larger bufcache block sizes to be used when either MAXBSIZE (custom kernel)
  or the loader.conf tunable vfs.maxbcachebuf (GENERIC) is adjusted higher
  than the default.

  Suggested by: ken@

Changes:
_U  stable/12/
  stable/12/sys/fs/fuse/fuse.h
  stable/12/sys/fs/fuse/fuse_device.c
  stable/12/sys/fs/fuse/fuse_file.c
  stable/12/sys/fs/fuse/fuse_file.h
  stable/12/sys/fs/fuse/fuse_internal.c
  stable/12/sys/fs/fuse/fuse_internal.h
  stable/12/sys/fs/fuse/fuse_io.c
  stable/12/sys/fs/fuse/fuse_ipc.c
  stable/12/sys/fs/fuse/fuse_ipc.h
  stable/12/sys/fs/fuse/fuse_node.c
  stable/12/sys/fs/fuse/fuse_node.h
  stable/12/sys/fs/fuse/fuse_vfsops.c
  stable/12/sys/fs/fuse/fuse_vnops.c

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-fs mailing list