September OpenZFS Leadership Meeting

Mon Sep 23 16:22:25 UTC 2019

At this month's meeting we discussed:
- ZoL EOL of RHEL 6
- Xattr cross-platform compatibility
- Relaxed quota semantics for improved performance
- zpool replace of log vdev
- temporal dedup

Video is now up on youtube: https://youtu.be/kjBWhEE8tZ8

Full notes below (thanks Serapheim):

   -

   EOL ZoL on RHEL 6 (Brian Behlendorf)
   -

      RHEL 6 could be old enough that we could drop support for it on
      master (still supported for 0.8)
      -

      Technically will be EOL'd by Red Hat in November 2020.
      -

      Feedback from the community: Given enough Notifications beforehand,
      people should be fine
      -

      Actual change needed in ZoL:
      -

         Go through build system code and remove any references of v3.10
         kernel and older (new oldest supported kernel would be 3.11).
The process
         should be similar on what ZoL did for deprecating RHEL5.
         -

      Action Items:
      -

         Brian/Matt will give a heads up in the mailing list, in the
         release notes of each versions until then, open PR for this
         -

         We need volunteers for the build system changes
         -

   Xattr cross-platform compatibility (Andrew Walker)
   -

      Problem:
      -

         ixSystems works with services that receive alternate data streams
         written as xattrs in FreeBSD in the user namespace, which is
implemented
         slightly different in Linux (there is "user." prefix - FreeBSD uses
         "freebsd." prefix(?) - Solaris uses "smb." prefix). Their application
         (Samba) is doing the same thing in Linux and FreeBSD, but ZFS
represents
         them different on-disk between each platform. As a result,
xattrs that are
         written in FreeBSD are visible in other OSes except from ZoL where the
         metadata disappears.
         -

      Potential Solutions:
      -

         Brian: ZoL has around 4 prefixes, so one solution would be to have
         user as a fallback choice (e.g. if it is not part of any
namespace, it is
         part of the user namespace).
         -

         Andrew Walker: Have a zfs dataset property to be able to tell
         which format is used
         -

         Andriy Gapon: Add some OS info on the actual attribute and have
         ZFS interpret them differently
         -

            Sef: Some form a feature flag that would fix the prefixes.
            -

         Matt Ahrens: First make it possible to read xattrs from all
         platforms, even if the names show up differently.  A
potential long-term
         solution: New stuff is written in some new format that is
portable across
         platforms (e.g. in the zfs.* namespace) and each platform
translates the
         ZFS prefixes to the local platform’s prefixes.
         -

      Question: Is it an incompatibility between different OSes? or an
      incompatibility between different implementations of ZFS? Shall we have a
      translation layer outside of ZFS?
      -

         A bit of both but mostly VFS layer (outside of ZFS code). Assuming
         it is only on the VFS layer, it would be reasonable to still
have some way
         of accessing these attributes. A point for this, is that in
ZoL there is
         little flexibility in changing the VFS code.
         -

      Action Items:
      -

         Proposal & Next steps - Andrew can start a writeup and coordinate
         with Alexander from iXSystems
         -

   Relax quota semantics for improved performance (Allan Jude)
   -

      Problem: As you approach quotas, ZFS performance degrades.
      -

      Proposal: Can we have a property like quota-policy=strict or loose,
      where we can optionally allow ZFS to run over the quota as long as
      performance is not decreased.
      -

      People's Feedback/Questions:
      -

         Richard Elling - Isn't it the same problem when the pool is almost
         full (SLOP space)? Answer: This is slightly different, but
the mechanism is
         the same, and we don't want to break that (e.g. run beyond
SLOP space just
         like that).
         -

      Tangent: Should we scale the SLOP space appropriately? The SLOP space
      can bite a big chunk of space in big pools.
      -

         Feedback: That seems reasonable, though the use cases may not be
         that many (fragmentation issues in such big pools will probably arise
         before encountering the SLOP space issue).  See discussion
         <https://github.com/zfsonlinux/zfs/pull/8106#issuecomment-437499997>
         on previous PR.
         -

   zpool replace of a log (and maybe a cache) vdev – does this work well?
   Can it be improved? (Andriy Gapon)
   -

      Problem: a user had to replace a log device using the replace command
      and it took a long time (dozens of gigabytes were scanned). Can we do
      better? It seems like there is not special logic for devices
like that, do
      we want to do something different for log vdevs? Even maybe
prohibit using
      replace for these devices and advice the remove & add workflow.
      -

      Feedback: the above sound reasonable except for one thing. Log
      devices can have actual data on them. If you crash and you have blocks in
      the log device and you've removed the device, and you don't mount the
      specific filesystems, these blocks will stay there. Encryption
should also
      make this more common. We need to retain the ability for the scrub-based
      replace/attach. We could improve the performance by looking at all the
      blocks of all the logs instead of looking at all the blocks in the pool.
      -

      Action Item: Andriy will look into this and create a doc
      -

   Renaming bookmarks – are there any pitfalls? Seems like a useful feature
   that’s not been implemented in a long time (Andriy Gapon)
   -

      Feedback: It should just work - one more thing to plumb through the
      CLI, libzfs, etc… internally, removing the ZAP entry and
re-adding it with
      the new name should do the trick
      -

   Panzura to open source their temporal dedup implementation  (Josh P)
   -

      Panzura will be open-sourcing some parts of their self-contained ZFS
      implementation of temporal dedup on Github. There is hope from
Panzura that
      this will be integrated within OpenZFS but at least for now there are no
      concrete plans of getting this code upstreamed without volunteers.
      -

      Question: What is temporal dedup?
      -

         A dedup scheme that groups blocks by the time that they are
         created/modified etc... Grouping blocks in such way should
allow for faster
         access to the data due to caching based on temporal locality

On Tue, Sep 17, 2019 at 8:47 AM Matthew Ahrens <mahrens at delphix.com> wrote:

> The next OpenZFS Leadership meeting will be held today, September 17,
> 1pm-2pm Pacific time.
>
> Everyone is welcome to attend and participate, and we will try to keep the
> meeting on agenda and on time.  The meetings will be held online via Zoom,
> and recorded and posted to the website and YouTube after the meeting.
>
> The agenda for the meeting will be a discussion of the projects listed in
> the agenda doc.
>
> For more information and details on how to attend, as well as notes and
> video from the previous meeting, please see the agenda document:
>
>
> https://docs.google.com/document/d/1w2jv2XVYFmBVvG1EGf-9A5HBVsjAYoLIFZAnWHhV-BM/edit
>
> --matt
>