DELETE support in the VOP_STRATEGY(9)?

Warner Losh imp at bsdimp.com
Tue Dec 8 19:03:17 UTC 2015


> On Dec 8, 2015, at 11:52 AM, Steven Hartland <killing at multiplay.co.uk> wrote:
> 
> 
> 
> On 08/12/2015 18:44, Dag-Erling Smørgrav wrote:
>> Warner Losh <imp at bsdimp.com> writes:
>>> Dag-Erling Smørgrav <des at des.no> writes:
>>>> But the filesystem does not know whether the underlying storage is
>>>> electromechanical or solid-state, nor does it know whether the user
>>>> cares much about seek times (unless we introduce the heuristic
>>>> "avoid creating holes unless the file already has them, in which
>>>> case the userland probably does not care").
>>> Actually, the filesystem does know. Or has some knowledge of what
>>> is supported and what isn't. BIO_DELETE support is a strong indicator
>>> of a flash or other log-type system.
>> The filesystem can ask the layer below if BIO_DELETE is supported, but
>> should not assume anything about what it means.  For instance, I could
>> write a gnop-like module that translates BIO_DELETE into an all-zeroes
>> BIO_WRITE and passes everything else unmodified.  It would provide a
>> stronger guarantee than, say, SATA TRIM but would also have a completely
>> different performance profile (even on SSDs, since it would do its work
>> synchronously whereas TRIM works asynchronously).

That ship has sailed. UFS, at least, assumes that if TRIM is supported then
relocating files to be contiguous is bad.

But writing a gnop module that did the BIO_DELETE thing would be bogus.
BIO_DELETE does not mean that blocks will read back as zeros. But that’s
not what BIO_DELETE means. So, sure you could invent a stupid thing that
breaks the rules, and thus the assumptions of the other code, but why would
you want to do that?

The SATA trims are actually synchronous (in the absence of power failures).
Once you TRIM The data, it is gone. And depending what bits are set in
the identify response, you can count on different things. But to say they
happen asynchronously because of implementation details about when the data
is actually erased is missing the point. Also, your BIO_DELETE example
wouldn’t guarantee the data is erased either. Writes to log append devices
(like SSDs) are like a TRIM followed by a write: the old LBA mapping is
discarded and a new one replaces it.

>> Anyway, my point is that Maxim needs to revise his assumptions.
> Just to clarify most consumer devices process TRIM synchronously, not asynchronously.

It also depends on what you mean by ‘process’ here.

> Your example isn't actually just an example CAM scsi_da has a number of different ways it can process BIO_DELETE:
> * ATA TRIM
> * SCSI UMAP
> * Write Same 16
> * Write Same 10
> * Zero
> 
> So you example is actually exists in practice in the FreeBSD code base ;-)

All these are effectively TRIM operations. The devices that implement them
use them as hints to optimize storage. DES’ BIO_DELETE -> WRITE zero
example doesn’t optimize storage at all, nor does it give the lower layers
any clue about how to optimize the storage. All the SCSI delete types
do give that hint.

Warner

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20151208/d9c517f7/attachment.sig>


More information about the freebsd-hackers mailing list