NEW TAR
Harti Brandt
harti at freebsd.org
Wed Jul 21 08:31:00 PDT 2004
On Wed, 21 Jul 2004, Daniel Lang wrote:
DL>Hi,
DL>
DL>Jan Grant wrote on Wed, Jul 21, 2004 at 02:44:42PM +0100:
DL>[..]
DL>> You're correct, in that filesystem semantics don't require an archiver
DL>> to recreate holes. There are storage efficiency gains to be made in
DL>> identifying holes, that's true - particularly in the case of absolutely
DL>> whopping but extremely sparse files. In those cases, a simple
DL>> userland-view-of-the-filesystem-semantics approach to ideentifying areas
DL>> that _might_ be holes (just for archive efficiency) can still be
DL>> expensive and might involve the scanning of multiple gigabytes of
DL>> "virtual" zeroes.
DL>>
DL>> Solaris offers an fcntl to identify holes (IIRC) for just this purpose.
DL>> If the underlying filesystem can't be made to support it, there's an
DL>> efficiency loss but otherwise it's no great shakes.
DL>
DL>I don't get it.
DL>
DL>I assume, that for any consumer it is totally transparent if
DL>possibly existing chunks of 0-bytes are actually blocks full of
DL>zeroes or just non-allocated blocks, correct?
DL>
DL>Second, it is true, that there is a gain in terms of occupied disk
DL>space, if chunks of zeroes are not allocated at all, correct?
DL>
DL>So, from my point of view it is totally irrelevant, if a sparse file
DL>is archived and then extracted, if the areas, which contain zeroes
DL>are exactly in the same manner consisting of unallocated blocks
DL>or not.
DL>
DL>So, all I guess an archiver must do is:
DL>
DL> - read the file
DL> - scan the file for consecutive blocks of zeroes
DL> - archive these blocks in an efficient way
DL> - on extraction, create a sparse file with the previously
DL> identified empty blocks, regardless if these blocks
DL> have been 'sparse' blocks in the original file or not.
DL>
DL>I do not see, why it is important if the original file was sparse
DL>at all or maybe in different places.
It just may be a good deal faster just to take existing hole information
(if it exists) than to scan the file.
Also there is a difference between holes and actual zeroes: it's like
overcommitting memory. Yoy may have a 1TB file consisting of a large
hole on a 10GB disk. Just as you write something to it you will get an
error at some time even when writing into the middle of the file, just
because the FS needs to allocate blocks. I could imagine an application
knowing its access pattern to a large sparse file allocating zeroed blocks
in advance while skipping blocks that it knows it'll not write, just to
make sure the blocks are there when it will write later on. But that's a
rather hypothetical application.
harti
More information about the freebsd-current
mailing list