svn commit: r296610 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs

Alexander Motin mav at FreeBSD.org
Thu Mar 10 10:04:13 UTC 2016


On 10.03.16 11:57, Steven Hartland wrote:
> When processing the MFC of this I noticed there's lots of merge info @
> sys/cddl/contrib/opensolaris.
> 
> I've removed this from the MFC request but wondered if we should clean
> up HEAD so this doesn't accidentally get merged?

Merge info @ sys/cddl/contrib/opensolaris in head is valid one, tracking
process of MFVs from respective vendor branch to head.  Not that I use
it often, but is it incorrect?

> On 10/03/2016 09:01, Alexander Motin wrote:
>> Author: mav
>> Date: Thu Mar 10 09:01:19 2016
>> New Revision: 296610
>> URL: https://svnweb.freebsd.org/changeset/base/296610
>>
>> Log:
>>    MFV r296609: 6370 ZFS send fails to transmit some holes
>>       Reviewed by: Matthew Ahrens <mahrens at delphix.com>
>>    Reviewed by: Chris Williamson <chris.williamson at delphix.com>
>>    Reviewed by: Stefan Ring <stefanrin at gmail.com>
>>    Reviewed by: Steven Burgess <sburgess at datto.com>
>>    Reviewed by: Arne Jansen <sensille at gmx.net>
>>    Approved by: Robert Mustacchi <rm at joyent.com>
>>    Author: Paul Dagnelie <pcd at delphix.com>
>>       In certain circumstances, "zfs send -i" (incremental send) can
>> produce a
>>    stream which will result in incorrect sparse file contents on the
>>    target.
>>       The problem manifests as regions of the received file that
>> should be
>>    sparse (and read a zero-filled) actually contain data from a file that
>>    was deleted (and which happened to share this file's object ID).
>>       Note: this can happen only with filesystems (not zvols, because
>> they do
>>    not free (and thus can not reuse) object IDs).
>>       Note: This can happen only if, since the incremental source
>> (FromSnap),
>>    a file was deleted and then another file was created, and the new file
>>    is sparse (i.e. has areas that were never written to and should be
>>    implicitly zero-filled).
>>       We suspect that this was introduced by 4370 (applies only if
>> hole_birth
>>    feature is enabled), and made worse by 5243 (applies if hole_birth
>>    feature is disabled, and we never send any holes).
>>       The bug is caused by the hole birth feature. When an object is
>> deleted
>>    and replaced, all the holes in the object have birth time zero.
>> However,
>>    zfs send cannot tell that the holes are new since the file was
>> replaced,
>>    so it doesn't send them in an incremental. As a result, you can end up
>>    with invalid data when you receive incremental send streams. As a
>>    short-term fix, we can always send holes with birth time 0 (unless
>> it's
>>    a zvol or a dataset where we can guarantee that no objects have been
>>    reused).
>>       Closes #37
>>       openzfs/openzfs at adef853162e83f7cdf6b2d9af9756d434a9c743b
>>
>> Modified:
>>    head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c
>>    head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c
>> Directory Properties:
>>    head/sys/cddl/contrib/opensolaris/   (props changed)
>>
>> Modified:
>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c
>> ==============================================================================
>>
>> ---
>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c   
>> Thu Mar 10 08:56:18 2016    (r296609)
>> +++
>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c   
>> Thu Mar 10 09:01:19 2016    (r296610)
>> @@ -20,7 +20,7 @@
>>    */
>>   /*
>>    * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All
>> rights reserved.
>> - * Copyright (c) 2013, 2014 by Delphix. All rights reserved.
>> + * Copyright (c) 2013, 2015 by Delphix. All rights reserved.
>>    * Copyright 2014 HybridCluster. All rights reserved.
>>    */
>>   @@ -50,6 +50,12 @@ dmu_object_alloc(objset_t *os, dmu_objec
>>            * reasonably sparse (at most 1/4 full).  Look from the
>>            * beginning once, but after that keep looking from here.
>>            * If we can't find one, just keep going from here.
>> +         *
>> +         * Note that dmu_traverse depends on the behavior that we use
>> +         * multiple blocks of the dnode object before going back to
>> +         * reuse objects.  Any change to this algorithm should preserve
>> +         * that property or find another solution to the issues
>> +         * described in traverse_visitbp.
>>            */
>>           if (P2PHASE(object, L2_dnode_count) == 0) {
>>               uint64_t offset = restarted ? object << DNODE_SHIFT : 0;
>>
>> Modified:
>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c
>> ==============================================================================
>>
>> ---
>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c   
>> Thu Mar 10 08:56:18 2016    (r296609)
>> +++
>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c   
>> Thu Mar 10 09:01:19 2016    (r296610)
>> @@ -20,7 +20,7 @@
>>    */
>>   /*
>>    * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All
>> rights reserved.
>> - * Copyright (c) 2012, 2014 by Delphix. All rights reserved.
>> + * Copyright (c) 2012, 2016 by Delphix. All rights reserved.
>>    * Copyright (c) 2015 Chunwei Chen. All rights reserved.
>>    */
>>   @@ -63,6 +63,7 @@ typedef struct traverse_data {
>>       uint64_t td_hole_birth_enabled_txg;
>>       blkptr_cb_t *td_func;
>>       void *td_arg;
>> +    boolean_t td_realloc_possible;
>>   } traverse_data_t;
>>     static int traverse_dnode(traverse_data_t *td, const dnode_phys_t
>> *dnp,
>> @@ -232,18 +233,30 @@ traverse_visitbp(traverse_data_t *td, co
>>         if (bp->blk_birth == 0) {
>>           /*
>> -         * Since this block has a birth time of 0 it must be a
>> -         * hole created before the SPA_FEATURE_HOLE_BIRTH
>> -         * feature was enabled.  If SPA_FEATURE_HOLE_BIRTH
>> -         * was enabled before the min_txg for this traveral we
>> -         * know the hole must have been created before the
>> -         * min_txg for this traveral, so we can skip it. If
>> -         * SPA_FEATURE_HOLE_BIRTH was enabled after the min_txg
>> -         * for this traveral we cannot tell if the hole was
>> -         * created before or after the min_txg for this
>> -         * traversal, so we cannot skip it.
>> +         * Since this block has a birth time of 0 it must be one of
>> +         * two things: a hole created before the
>> +         * SPA_FEATURE_HOLE_BIRTH feature was enabled, or a hole
>> +         * which has always been a hole in an object.
>> +         *
>> +         * If a file is written sparsely, then the unwritten parts of
>> +         * the file were "always holes" -- that is, they have been
>> +         * holes since this object was allocated.  However, we (and
>> +         * our callers) can not necessarily tell when an object was
>> +         * allocated.  Therefore, if it's possible that this object
>> +         * was freed and then its object number reused, we need to
>> +         * visit all the holes with birth==0.
>> +         *
>> +         * If it isn't possible that the object number was reused,
>> +         * then if SPA_FEATURE_HOLE_BIRTH was enabled before we wrote
>> +         * all the blocks we will visit as part of this traversal,
>> +         * then this hole must have always existed, so we can skip
>> +         * it.  We visit blocks born after (exclusive) td_min_txg.
>> +         *
>> +         * Note that the meta-dnode cannot be reallocated.
>>            */
>> -        if (td->td_hole_birth_enabled_txg < td->td_min_txg)
>> +        if ((!td->td_realloc_possible ||
>> +            zb->zb_object == DMU_META_DNODE_OBJECT) &&
>> +            td->td_hole_birth_enabled_txg <= td->td_min_txg)
>>               return (0);
>>       } else if (bp->blk_birth <= td->td_min_txg) {
>>           return (0);
>> @@ -338,6 +351,15 @@ traverse_visitbp(traverse_data_t *td, co
>>           objset_phys_t *osp = buf->b_data;
>>           prefetch_dnode_metadata(td, &osp->os_meta_dnode, zb->zb_objset,
>>               DMU_META_DNODE_OBJECT);
>> +        /*
>> +         * See the block comment above for the goal of this variable.
>> +         * If the maxblkid of the meta-dnode is 0, then we know that
>> +         * we've never had more than DNODES_PER_BLOCK objects in the
>> +         * dataset, which means we can't have reused any object ids.
>> +         */
>> +        if (osp->os_meta_dnode.dn_maxblkid == 0)
>> +            td->td_realloc_possible = B_FALSE;
>> +
>>           if (arc_buf_size(buf) >= sizeof (objset_phys_t)) {
>>               prefetch_dnode_metadata(td, &osp->os_groupused_dnode,
>>                   zb->zb_objset, DMU_GROUPUSED_OBJECT);
>> @@ -544,12 +566,13 @@ traverse_impl(spa_t *spa, dsl_dataset_t
>>       td.td_pfd = &pd;
>>       td.td_flags = flags;
>>       td.td_paused = B_FALSE;
>> +    td.td_realloc_possible = (txg_start == 0 ? B_FALSE : B_TRUE);
>>         if (spa_feature_is_active(spa, SPA_FEATURE_HOLE_BIRTH)) {
>>           VERIFY(spa_feature_enabled_txg(spa,
>>               SPA_FEATURE_HOLE_BIRTH, &td.td_hole_birth_enabled_txg));
>>       } else {
>> -        td.td_hole_birth_enabled_txg = 0;
>> +        td.td_hole_birth_enabled_txg = UINT64_MAX;
>>       }
>>         pd.pd_flags = flags;
>>
> 


-- 
Alexander Motin


More information about the svn-src-head mailing list