Reading a corrupted file on ZFS

Karl Denninger karl at denninger.net
Fri Feb 12 16:37:20 UTC 2021


On 2/12/2021 11:22, Artem Kuchin wrote:
> 12.02.2021 18:52, Fabian Keil пишет:
>> Artem Kuchin <artem at artem.ru> wrote on 2021-02-12:
>>
>>> 12.02.2021 18:06, Karl Denninger пишет:
>>>> Blocking the read forces you to get the good copy off backup media and
>>>> thus prevents that from happening.
>>>>
>>> I know what ZFS does and i damaged the same file in the same place on
>>> purpose. Question is: how to read what's left of it. Just for kicks, i
>>> don't have a backup, and i need to read what's left. It could be 1GB
>>> file with only one byte damaged and it is of crazy importance to me. 
>>> So,
>>> how to bypass all the checks and make it read the file no matter what?
>> The patch from this PR adds a sysctl that allows to send corrupted data:
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221909
>>
>> Using the added sysctl you can send and receive the dataset and then
>> read the corrupted file from the received dataset. Note that ZFS 
>> replaces
>> corrupted blocks completely with the 0x'zfs badd bloc' pattern instead
>> of returning the corrupted data as is, thus increasing the amount of
>> corruption in case of simple bit flips to whole blocks.
>>
>> Fabian
>
> Arghh. That's not what i want. This is strange. In case of stupid old 
> FS like FAT or even newer UFS i can dig into damaged file and collect 
> as much data as possible, while newer ZFS does not provide tools to 
> dig into data. That's was always my concern about ZFS. If something 
> bad goes with FAT/NTFS and even UFS - there are tons of tools which 
> can dissect the file system into bits so i can get as much as possible 
> of what's left. In case of ZFS there are no tools that i know and even 
> ZFS itself does not allow to get what left of normal data.
>
> This is frustrating. why..why..

You created a synthetic situation that in the real world almost-never 
exists (ONE byte modified in all copies in the same allocation block but 
all other data in that block is intact and recoverable.)

In almost-all actual cases of "bit rot" it's exactly that; random and by 
statistics extraordinarily unlikely to hit all copies at once in the 
same allocation block.  Therefore, ZFS can and does fix it; UFS or FAT 
silently returns the corrupted data, propagates it, and eventually 
screws you down the road.

The nearly-every-case situation in the real world where a disk goes 
physically bad (I've had this happen *dozens* of times over my IT 
career) results in the drive being unable to return the block at all; 
you don't get all but the bad byte back, you get nothing for that block 
and any attempt to "touch" it results in either a hard error coming back 
with no data in the buffer or (if not a TLER device) a wildly-extended 
timeout before an I/O error is returned with, again, no usable data in 
the buffer.  On "old" winchester-style spinning media and even floppy 
drives this resulted in an entire physical sector (usually 512 bytes) 
being irretrievably lost.  In the case of a "modern" zoned or 
advanced-format hard drive or an SSD the amount of data impacted and 
unreadable is typically much larger than one sector; for an SDD it is 
frequently *at least* a 4k block (which can and frequently does span 
multiple files!) and for many instances of rotating rust it can be an 
entire *track* if the servo data is where the fault lies which can be a 
*huge* amount of data.

The patch gives you all but one allocation block of data from ZFS, with 
that one block effectively zeroed.  This is no worse than the usual 
actual (not your synthesized test) impact of such a failure in a the 
real world with other filesystems in virtually every instance where it 
happens "in the wild."

In short there are very, very few actual "in the wild" failures where 
one byte is damaged and the rest surrounding that one byte is intact and 
retrievable.  In most cases where an actual failure occurs the 
unreadable data constitutes *at least* a physical sector.

-- 
Karl Denninger
karl at denninger.net <mailto:karl at denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4897 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20210212/5d332d26/attachment.bin>


More information about the freebsd-fs mailing list