ZSTD Project Weekly Status Update

Allan Jude allanjude at freebsd.org
Thu Aug 6 02:49:57 UTC 2020


This is the seventh weekly status report on the project to integrate
ZSTD into OpenZFS.

The compatibility related changes I created last week were refined and
marged into the mainline branch.

Thanks to Brian Behlendorf for reviewing my proposed change for the zstd
feature flag activation, and pointing out a better approach. I have
reworked the patch based on his suggestion and prototype:

https://github.com/allanjude/zfs/commit/2508dafcec0a05d61afc5fbd5da356e201afbe97
- Activate the per-dataset ZSTD feature flag as soon as the property is
set to ZSTD. Before, simply doing `zfs set compression=zstd dataset`
would not activate the feature flag. The feature flag would be activated
when the first block that used ZSTD compression was written (see
dsl_dataset_block_born()). This meant that if you set the property,
exported the pool, the pool would import on systems with older versions
of ZFS that did not support ZSTD, but would crash their userspace tools,
because the property value was out of bounds.


https://github.com/allanjude/zfs/commit/b8bec3fd2a8feb3a4de572eb15515d3764f92a35
- I created a test that ensures that the feature flag is activated by
`zfs set compression=zstd` and also ensures that the feature flag
reverts to the 'enabled' state once the last dataset using zstd is
destroyed.


The next step is ensuring that ZSTD compression inter-operates properly
with the L2ARC and Encryption etc.

I've also been discussing ideas with Brian about future-proofing, to
handle the case where a newer version of ZSTD might compression the same
input differently (better ratio), and how that would impact L2ARC,
nop-write, etc. One idea (originally from Pawel Dawidek) is to do
something similar to what encryption does, and split the checksum field.
Using half to checksum the original data, and half the compressed
version. This would allow ZFS to detect when the same content compressed
differently (combined with the ZSTD version header in the compressed
data), giving better compatibility as we upgrade ZSTD.


This project is sponsored by the FreeBSD Foundation.



On 2020-07-29 21:10, Allan Jude wrote:
> This is the sixth weekly status report on the project to integrate ZSTD
> into OpenZFS.
> 
> https://github.com/openzfs/zfs/pull/10631 - Improved the `zfs recv`
> error handling when it receives an out-of-bounds property value.
> Specifically, if a zfs send stream is created that supports a newer
> compression or checksum type, the property will fail to be set on the
> receiving system. This is fine, but `zfs recv` would abort() and create
> a core file, rather than reporting the error, because it did not
> understand the EINVAL being returned for that property. In the case
> where the property is outside the accepted range, we now return the new
> ZFS_ERR_BADPROP value, and the correct message is displayed to the user.
> I opted not to use ERANGE because that is used for 'this property value
> should not be used on a root pool'. The idea is to get this fix merged
> into the 0.8.x branch for the next point release, to improve
> compatibility with streams generated by OpenZFS 2.0
> 
> 
> https://github.com/openzfs/zfs/pull/10632 - General improvement to error
> handling when the error code is EZFS_UNKNOWN.
> 
> 
> https://github.com/allanjude/zfs/commit/8f37c1ad8edaff20a550b3df07995dab80c06492
> - ZFS replication compatibility improvements. As discussed on the
> leadership call earlier this month, keep the compatibility simple. If
> the -c flag is given, send blocks compressed with any compression
> algorithm. The improved error handling will let the user know if their
> system can't handle ZSTD.
> 
> 
> https://github.com/allanjude/zfs/commit/0ffd80e281f79652973378599cd0332172f365bd
> - per-dataset feature activation. This switches the ZSTD feature flag
> from 'enabled' to 'active' as soon as the property is set, instead of
> when the first block is written. This ensures that the pool can't be
> imported on a system that does not support ZSTD that will cause the ZFS
> cli tools to panic.
> 
> 
> I will be working on adding some tests for the feature activation.
> 
> I've been looking at ways to add tests for the replication changes, but
> it doesn't seem to be easy to test the results of a 'zfs recv' that does
> not know about ZSTD (where the values are outside of the valid range for
> the enum). If anyone has any ideas here, I'd be very interested.
> 
> 
> On 2020-07-20 23:40, Allan Jude wrote:
>> This is the fifth weekly status report on the project to integrate ZSTD
>> into OpenZFS.
>>
>> https://github.com/c0d3z3r0/zfs/pull/14/commits/9807c99169e5931a754bb0df68267ffa2f289474
>> - Created a new test case to ensure that ZSTD compressed blocks survive
>> replication with the -c flag. We wanted to make sure the on-disk
>> compression header survived the trip.
>>
>> https://github.com/c0d3z3r0/zfs/pull/14/commits/94bef464fc304e9d6f5850391e41720c3955af11
>> - I split the zstd.c file into OS specific bits
>> (module/os/{linux,freebsd}/zstd_os.c) and also split the .h file into
>> zstd.h and zstd_impl.h. This was done so that FreeBSD can use the
>> existing kmem_cache mechanism, while Linux can use the vmem_alloc pool
>> created in the earlier versions of this patch. I significantly changed
>> the FreeBSD implementation from my earlier work, to reuse the power of 2
>> zio_data_buf_cache[]'s that already exist, only adding a few additional
>> kmem_caches for large blocks with high compression levels. This should
>> avoid creating as many unnecessary kmem caches.
>>
>> https://github.com/c0d3z3r0/zfs/pull/14/commits/3d48243b77e6c8c3bf562c7a2315dd6cc571f28c
>> - Lastly, in my testing I was seeing a lot of hits on the new
>> compression failure kstat I added. This was caused by the ZFS "early
>> abort" feature, where we give the compressor an output buffer that is
>> smaller than the input, so it will fail if the block will not compress
>> enough to be worth it. This helps avoid wasting CPU on uncompressible
>> blocks. However, it seems the 'one-file' version of zstd we are using
>> does not expose the ZSTD_ErrorCode enum. This needs to be investigated
>> further to avoid issues if the value changes (although it is apparently
>> stable after version 1.3.1).
>>
>> I am still working on a solution for zfs send stream compatibility. I am
>> leaning towards creating a new flag, --zstd, to enable ZSTD compressed
>> output. If the -e or -c flag are used without the --zstd flag, and the
>> dataset has the zstd feature active, the idea would be to emit a warning
>> but send the blocks uncompressed, so that the stream remains compatible
>> with older versions of ZFS. I will be discussing this on the OpenZFS
>> Leadership call tomorrow, and am open to suggestions on how to best
>> handle this.
>>
>>
>> On 2020-07-14 22:26, Allan Jude wrote:
>>> In my continuing effort to complete the integration of ZSTD into
>>> OpenZFS, here is my fourth weekly status report:
>>>
>>> https://github.com/allanjude/zfs/commit/b0b1270d4e7835ecff413208301375e3de2a4153
>>> - Create a new test case to make sure that the ZSTD header we write
>>> along with the data is correct. Verify that the physical size of the
>>> compressed data is less than the psize for the block pointer, and verify
>>> that the level matches. It uses a random level between 1 and 19 and then
>>> verifies with zdb that the block was compressed with that level.
>>>
>>> I am still working on a solution for setting the zstd feature flag to
>>> 'active' as soon as it is set, rather than only once a block is born. As
>>> well as fixing up compatibility around zfs send/recv with the embedded
>>> block points flag.
>>>
>>> This project is sponsored by the FreeBSD Foundation.
>>>
>>>
>>
> 
> 


-- 
Allan Jude

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 834 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20200805/00b2369f/attachment.sig>


More information about the freebsd-fs mailing list