svn commit: r289405 - head/sys/ufs/ffs

Warner Losh imp at bsdimp.com
Fri Oct 16 21:01:00 UTC 2015


> On Oct 16, 2015, at 2:18 PM, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote:
> 
> On Fri, Oct 16, 2015 at 01:22:44PM -0600, Warner Losh wrote:
> 
>> 
>>> On Oct 16, 2015, at 7:19 AM, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote:
>>> 
>>> On Fri, Oct 16, 2015 at 03:06:02AM +0000, Warner Losh wrote:
>>> 
>>>> Author: imp
>>>> Date: Fri Oct 16 03:06:02 2015
>>>> New Revision: 289405
>>>> URL: https://svnweb.freebsd.org/changeset/base/289405
>>>> 
>>>> Log:
>>>> Do not relocate extents to make them contiguous if the underlying drive can do
>>>> deletions. Ability to do deletions is a strong indication that this
>>>> optimization will not help performance. It will only generate extra write
>>>> traffic. These devices are typically flash based and have a limited number of
>>>> write cycles. In addition, making the file contiguous in LBA space doesn't
>>>> improve the access times from flash devices because they have no seek time.
>>> 
>>> In reality, flash devices have seek time, about 0.1ms.
>>> Many flash devices can do 8 simultaneously "seek" (I think NVMe can do
>>> more).
>> 
>> That's just not true. tREAD for most flash is a few tens of microseconds. The
>> streaming time is at most 10 microseconds. There's no "seek" time in the classic
>> sense. Once you get the data, you have it. There's no extra "read time" in
>> the NAND flash parts.
>> 
>> And the number of simultaneous reads depends a lot on how the flash vendor
>> organized the flash. Many of today's designs use 8 or 16 die parts that have 2
>> to 4 planes on them, giving a parallelism in the 16-64 range. And that's before
>> we get into innovative strategies that use partial page reads to decrease tREAD
>> time and novel data striping methods.
>> 
>> Seek time, as a separate operation, simply doesn't exist.
>> 
>> Furthermore, NAND-based devices are log-structured with garbage collection
>> for both retention and to deal with retired blocks in the underlying NAND. The
>> relationship between LBA ranges and where the data is at any given time on
>> the NAND is almost uncorrelated.
>> 
>> So, rearranging data so that it is in LBA contiguous ranges doesn't help once
>> you're above the FFS block level.
> 
> Stream of random reads 512-4096 bytes from most flash SATA drives in one
> thread give about 10K IOPS. This is only 40Mbit/s from 6*0.8 Gbit/s
> SATA bandwidth. You may decompose 0.1ms to different, real delay (bank
> select, command process and etc.) or give 0.1ms seek time for all
> practical purpose.

I strongly disagree. That’s not seek time in the classic sense. All of those 100us
are the delay from reading the data from the flash. The reason I’m so adamant
is that adjacent pages read have exactly the same cost. In a spinning disk,
adjacent sectors read have a tiny cost compared to moving the head (seeking).

Then again, I spent almost three years building a PCIe NAND-based flash
drive, so maybe I’m biased by that experience...

Warner

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.freebsd.org/pipermail/svn-src-head/attachments/20151016/861874f2/attachment.bin>


More information about the svn-src-head mailing list