Fwd: DELETE support in the VOP_STRATEGY(9)?
    Konstantin Belousov 
    kostikbel at gmail.com
       
    Tue Dec  8 17:54:22 UTC 2015
    
    
  
On Tue, Dec 08, 2015 at 08:43:33AM -0700, Warner Losh wrote:
> [ forgot to cc hackers ]
> ---------- Forwarded message ----------
> From: Warner Losh <imp at bsdimp.com>
> Date: Tue, Dec 8, 2015 at 8:41 AM
> Subject: Re: DELETE support in the VOP_STRATEGY(9)?
> To: Dag-Erling Sm??rgrav <des at des.no>
> 
> 
> On Tue, Dec 8, 2015 at 4:06 AM, Dag-Erling Sm??rgrav <des at des.no> wrote:
> 
> > Maxim Sobolev <sobomax at FreeBSD.org> writes:
> > > Dag-Erling Sm??rgrav <des at des.no> writes:
> > > > 1) why did you take this off the list?
> > > There was a complain from list admin about this being off-topic.
> >
> > Yes, and Eitan moved the discussion to hackers at .  It should have stayed
> > there.
> >
> > > > 2) why did you even bother to cc: me if you were going to competely
> > > > ignore everything I said anyway?
> > > I did not really ignore it, it just that I did not have much to reply
> > > at that point.  [...] Basically I don't think your concerns wrt DELETE
> > > reliability/gurantees have much to do with this particular feature.
> > > The reason being that BIO_DELETE essentially tells the storage layer
> > > that whichever code "owns" the block in question (e.g.  ZFS or UFS)
> > > has moved it into the free pool and will NEVER ever want to read its
> > > value back again (until it's written into again).
> >
> > No, it means that the contents of that block are no longer important and
> > that the lower layers *may* reclaim it.  It does not mean that nobody
> > will ever try to read the block, nor does it guarantee that the block
> > will actually be reclaimed or zeroed.  We cannot rely on the lower
> > layers to ensure that reading from a previously deleted block never
> > returns data that may have belonged to a different file.
> >
> > BTW, I've encountered CF cards (including the SanDisk card in my home
> > router) that freeze if issued a TRIM command.  Furthermore, many CF, MMC
> > and SD cards, especially those marketed for use in digital cameras,
> > perform wear leveling "automagically" based on their own understanding
> > of the filesystem layout, and will therefore work poorly with anything
> > other than FAT (Kingston call it "optimized recording performance" in
> > their marketing literature).
> 
> 
> While these issues are relevant for BIO_DELETE, they aren't so much relevant
> for punching a hole in a file in a filesystem. The filesystem is the one
> that
> gets to decide whether and when to issue a BIO_DELETE (just as the lower
> layers get to decide what to do). A properly written filesystem will not
> issue
> a BIO_DELETE and then assume it will read back 0's. The whole point of
> the punch hole is to allow the filesystem to return the blocks to its free
> store. If that also happens to have the effect of causing a BIO_DELETE
> to go down, that's no different than deleting the file and having a
> BIO_DELETE
> go down for the resulting blocks that are freed.
Exactly. I completely agree with the statement above, and think that
this is the only thing that should be implemented. There could be a
request, most likely an filesystem-level ioctl, which punches holes in
the file. Its effect on the file state should be the same as if the
seek was done between writes, if the filesystem supports the ioctl.
More, if supported, the lseek(SEEK_HOLE/SEEK_DATA) behaviour should
be consistent with the request. The later might mean that we should
restrict the interface to only accept ranges at block boundaries.
Note that for UFS, it is automatically (due to the implementation at the
lower levels) that BIO_DELETE might be issued for the fully freed blocks,
when it is supported by the volume and safe from the PoV of the filesystem
safety.  In other words, the behaviour there should be the same as for
the blocks freed due to the file truncation or final unlinking.
UFS has a known quirk there, it does not allow hole to extend to the
end of file, the last fragment or block must be allocated. I once did
patched kernel and fsck to remove this restriction, but I probably lost
the patch.
    
    
More information about the freebsd-hackers
mailing list