Re: nvd->nda switch and blocksize changes for ZFS

From: Dimitry Andric <dim_at_FreeBSD.org>
Date: Mon, 25 Sep 2023 11:58:35 UTC
On 25 Sep 2023, at 08:42, Frank Behrens <frank@harz2023.behrens.de> wrote:
> 
> Hi Dimitry, Yuri and also Mark, thanks for your fast responses!
> 
> Am 23.09.2023 um 20:58 schrieb Yuri Pankov:
...
> # smartctl -a /dev/nvme0
> Namespace 1 Formatted LBA Size:     512
> ...
> Supported LBA Sizes (NSID 0x1)
> Id Fmt  Data  Metadt  Rel_Perf
>  0 +     512       0         0

This is the default compatibility sector size of 512 bytes, so it is not relevant.


> # nvmecontrol identify nda0 and # nvmecontrol identify nvd0 (after hw.nvme.use_nvd="1" and reboot) give the same result:
> Number of LBA Formats:       1
> Current LBA Format:          LBA Format #00
> LBA Format #00: Data Size:   512  Metadata Size:     0  Performance: Best
> ...
> Optimal I/O Boundary:        0 blocks
> NVM Capacity:                1000204886016 bytes
> Preferred Write Granularity: 32 blocks
> Preferred Write Alignment:   8 blocks
> Preferred Deallocate Granul: 9600 blocks
> Preferred Deallocate Align:  9600 blocks
> Optimal Write Size:          256 blocks

My guess is that the "Preferred Write Granularity" is the optimal size, in this case 32 'blocks' of 512 bytes, so 16 kiB. This also matches the stripe size reported by geom, as you showed.

The "Preferred Write Alignment" is 8 * 512 = 4 kiB, so you should align partitions etc to at least this. However, it cannot hurt to align everything to 16 kiB either, which is an integer multiple of 4 kiB.


> The recommended blocksize for ZFS is GEOM's stripesize and there I see a difference:
> 
> # diff -w -U 10  gpart_list_nvd.txt gpart_list_nda.txt
> -Geom name: nvd0
> +Geom name: nda0
>  modified: false
>  state: OK
>  fwheads: 255
>  fwsectors: 63
>  last: 1953525127
>  first: 40
>  entries: 128
>  scheme: GPT
>  Providers:
> -1. Name: nvd0p1
> +1. Name: nda0p1
>     Mediasize: 272629760 (260M)
>     Sectorsize: 512
> -   Stripesize: 4096
> -   Stripeoffset: 0
> +   Stripesize: 16384
> +   Stripeoffset: 4096

Yeah, I am suspecting that nda reports the "stripesize" from the NVMe "Preferred Write Granularity" and "stripeoffset" from the NVMe "Preferred Write Alignment". I think Warner's the resident expert on NVMe drivers, so maybe he's got some clue. :)

-Dimitry