Two questions --- SSD block sizes and buffering

Fri Apr 13 11:55:49 UTC 2018

On 13/04/2018 09:44, Ronald F. Guilmette wrote:
> 
> In message <1cc45af6-bd4b-3854-4d37-8e9343786ce6 at qeng-ho.org>, 
> Arthur Chance <freebsd at qeng-ho.org> wrote:
> 
>> ... man newfs says
>>
>> -f frag-size
>>        The fragment size of the file system in bytes.  It must be a
>>        power of two ranging in value between blocksize/8 and blocksize.
>>        The default is 4096 bytes.
>>
>> so it's been fixed for 4k disks.
> 
> Swell.  But that may not really do much (in the way of improving performance)
> if the underlying mass storage device has a "native" block size of, say,
> 128 KiB.  And as I noted, it is my understanding that essentially all
> flash-based mass storage devices -do- have a native block size of 128 KiB,
> or perhaps even larger.

I've not fully researched this subject so my understanding may not even
reach quarter baked, never mind half baked, but I believe the larger
"block size" has to do with bulk erasure of flash memory, and its exact
size depends on the details of the flash memory chips used in the SSD,
so could theoretically vary over time even within a manufacturer's model
number. Writing to flash can be done efficiently in small(ish) chunks
like 4 kiB for compatibility with spinning rust, and wear levelling is
handled by the SSD controller - when you "rewrite" a block it may be
written to a physically distinct block in which case the logical to
physical block map is updated (like log structured file systems). When
you "erase" a block via the TRIM operation the memory is left unchanged
but is simply marked as free (which should be remembered if you have
confidential information stored). Too many writes to flash risk
"blurring" the stored bits because of the way flash works, so
occasionally the SSD controller will garbage collect a larger block to
free all sub-blocks and do a bulk erase which resets the memory to
(almost) pristine state.

As I said, the exact size of these erasable blocks varies, but Microsoft
align their disk partitions on 1 MiB boundaries, which is presumably a
power of two larger than any existing SSD needs. That's why I've taken
to aligning my partitions to 1 MiB. Given typical disk sizes these days
even 1 MiB is minuscule.

> So.... When installing FreeBSD onto an SSD, would one be well advised
> to perform all newfs operations with an explicit "-f 131072" option?

It shouldn't be necessary, but if you want to try it remember you have
to set both -b and -f, and the manual recommends the block size is 8
times the fragment size. Personally I use ZFS for everything these days
except for one legacy machine with a UFS system disk, so my UFS
knowledge is a little rusty. I'm not sure if there's a limit on block size.

> More to the point, has anyone run any benchmarks to see if that would
> either help or hurt anything?
> 
> (There -are- a number of "cloud" providers these days who offer FreeBSD
> together with SSD mass storage, so it isn't as if my question is merely
> academic.   Modern SSDs have certainly been tuned up in ways so that
> they can gracefully accomodate almost whatever is thrown at them, even
> while maintaining decent performance, but that's not to say that even
> a little better performance couldn't be eeked out of them, with a little
> help from an accomodating OS/filesystem.)
> 
> (If the answer to that question is "no", I'd quite frankly be shocked.
> I've come to expect outstanding quality from FreeBSD generally, and
> that certainly includes all performance aspects.)
> 
>>>>     My question is just this:  Assme that one of these programs is called
>>>>     "xyz".  Now, if I run the program thusly:
>>>>
>>>>            xyz > xyz.output
>>>>
>>>>      i.e. so that stdout is redirected to a file, will there be one actual
>>>>      write to disk for each and every line that is written to stdout by xyz
>> ?
>>>>      In other words, will my act of explicitly setting line buffering (for
>>>>      stdout) in a case like this cause the xyz program to beat the living
>>>>      hell out of my disk drive?
>>>
>>> Probably not. The actual write operationg are being issued
>>> somewhere "down the line" through the file system driver
>>> down to the disk driver. Even a "sync" command issued by
>>> the OS will not _immediately_ cause the drive to act.
>>
>> write(2) calls, which underlie stdio, add the data to the disk block
>> image in the VM cache. Unless your machine is under extreme memory
>> pressure or you call fsync or sync the buffers eventually get written
>> out by a kernel task. See man syncer for details.
> 
> Thank you.  That certanly answers my question, and I am greatly relieved
> to know that I will probably not be materially shorting the life of my
> hardrive by doing the exact sort of silly thing I am doing.
> 
> 
> Regards,
> rfg
> 
> P.S. There actually -is- a good reason for me to be explicitly setting
> stdio line buffering on my stdout stream in my program.  You see, my
> program actually spaws a number of child processes and it is those
> children who are actually writing to stdout.  I've found that unless
> line buffering (for stdout) is set for all of them, sometimes bits and
> pieces of lines from the different children can get mangled together
> in the output, creating one big uselessly mangled blob.

A well known, you might even say classic, problem. stdio and forks don't
play terribly well together. I first got bitten by that back in 1979
under Unix V6 and IIRC line buffering wasn't an option then.

-- 
An amusing coincidence: log2(58) = 5.858 (to 0.0003% accuracy).