Re: nvd->nda switch and blocksize changes for ZFS

From: Frank Behrens <frank_at_harz2023.behrens.de>
Date: Mon, 25 Sep 2023 06:42:08 UTC
Hi Dimitry, Yuri and also Mark, thanks for your fast responses!

Am 23.09.2023 um 20:58 schrieb Yuri Pankov:
> Dimitry Andric wrote:
>> On 23 Sep 2023, at 18:31, Frank Behrens<frank@harz2023.behrens.de> wrote:
>>> I created a zpool with a FreeBSD-14.0-CURRENT on February. With 
>>> 15.0-CURRENT/14.0-STABLE from now I get the message:
>>>
>>> status: One or more devices are configured to use a non-native block 
>>> size.
>>> Expect reduced performance.
>>> action: Replace affected devices with devices that support the
>>> configured block size, or migrate data to a properly configured
>>> pool.
>>> NAME STATE READ WRITE CKSUM
>>> zsys ONLINE 0 0 0
>>> raidz1-0 ONLINE 0 0 0
>>> nda0p4 ONLINE 0 0 0 block size: 4096B configured, 16384B native
>>> nda1p4 ONLINE 0 0 0 block size: 4096B configured, 16384B native
>>> nda2p4 ONLINE 0 0 0 block size: 4096B configured, 16384B native
>>>
>>> I use:
>>> nda0: <Samsung SSD 980 1TB ..>
>>> nda0: nvme version 1.4
>>> nda0: 953869MB (1953525168 512 byte sectors)
>>>
>>> I cannot imagine, that the native blocksize changed. Do I really 
>>> expect a reduced performance?
>>> Is it advisable to switch back to nvd?
>> It could be due to a bug in nda so it reports the native block size 
>> incorrectly, in which case you would not need to do anything but 
>> ignore the message. However, if the native block size is really 
>> 16kiB, you will get write amplification effects, which could 
>> needlessly shorten the life of your SSD.
>>
>> I would try running e.g. smartmontools's smartctl, which can 
>> sometimes tell you what the real block size is. Although as far as I 
>> know, it retrieves this information from some internal database. You 
>> could also try to look up the information in the SSD vendor's data 
>> sheet, or ask the vendor directly?
> Isn't it displayed by e.g. `nvmecontrol identify nda0` under the LBA
> Formats (including the current one used to format the namespace)?

Based on your comments I made some investigations and switched back to nvd:

# smartctl -a /dev/nvme0
Namespace 1 Formatted LBA Size:     512
...
Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
  0 +     512       0         0


# nvmecontrol identify nda0 and # nvmecontrol identify nvd0 (after 
hw.nvme.use_nvd="1" and reboot) give the same result:
Number of LBA Formats:       1
Current LBA Format:          LBA Format #00
LBA Format #00: Data Size:   512  Metadata Size:     0  Performance: Best
...
Optimal I/O Boundary:        0 blocks
NVM Capacity:                1000204886016 bytes
Preferred Write Granularity: 32 blocks
Preferred Write Alignment:   8 blocks
Preferred Deallocate Granul: 9600 blocks
Preferred Deallocate Align:  9600 blocks
Optimal Write Size:          256 blocks

The recommended blocksize for ZFS is GEOM's stripesize and there I see a 
difference:

# diff -w -U 10  gpart_list_nvd.txt gpart_list_nda.txt
-Geom name: nvd0
+Geom name: nda0
  modified: false
  state: OK
  fwheads: 255
  fwsectors: 63
  last: 1953525127
  first: 40
  entries: 128
  scheme: GPT
  Providers:
-1. Name: nvd0p1
+1. Name: nda0p1
     Mediasize: 272629760 (260M)
     Sectorsize: 512
-   Stripesize: 4096
-   Stripeoffset: 0
+   Stripesize: 16384
+   Stripeoffset: 4096
     Mode: r1w1e2
     efimedia: HD(1,GPT,8d4c57bb-932f-11ed-82da-74563c227532,0x28,0x82000)
     rawuuid: 8d4c57bb-932f-11ed-82da-74563c227532
     rawtype: c12a7328-f81f-11d2-ba4b-00a0c93ec93b
     label: efiboot0
     length: 272629760
     offset: 20480
     type: efi
     index: 1
     end: 532519
     start: 40
...
-4. Name: nvd0p4
+4. Name: nda0p4
     Mediasize: 995635494912 (927G)
     Sectorsize: 512
-   Stripesize: 4096
+   Stripesize: 16384
     Stripeoffset: 0
     Mode: r1w1e1
     efimedia: 
HD(4,GPT,8d61a5ca-932f-11ed-82da-74563c227532,0x882800,0x73e84000)
     rawuuid: 8d61a5ca-932f-11ed-82da-74563c227532
     rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
     label: zfs0
     length: 995635494912
     offset: 4568645632
     type: freebsd-zfs
     index: 4
     end: 1953523711
     start: 8923136


With these information I'm not sure, if I have really a problem with the 
native blocksize. Does anybody know, how the stripesize is determined?

Kind regards,
    Frank

-- 
Frank Behrens
Osterwieck, Germany