Re: M2 NVME support

From: Freddie Cash <fjwcash_at_gmail.com>
Date: Thu, 13 Apr 2023 16:43:31 UTC
>
> Le jeu. 13 avr. 23 à 13:25:36 +0200, egoitz@ramattack.net <
> egoitz@ramattack.net>
> > We are in the process of buying new hardware for use with FreeBSD and
> > ZFS. We are planning whether to buy M2 NVME disks or just SATA SSD disks
> > (probably Samsung PM* ones). How is you experience with them?. Do you
> > recommend one over the another?. Is perhaps better support from some of
> > them from a specificic version to newer?. Or do they perhaps work better
> > with some specific disk controller?.
>

There were issues in the past where NVMe drives were "too fast" for ZFS,
and various bottlenecks were uncovered.  Most (all?) of those have been
fixed over the past couple years.  These issues were found on pools using
all NVMe drives in various configurations for data storage (multiple raidz
vdevs; multiple mirror vdevs).  This was back when PCIe 3.0 NVMe drives
were all the rage, or maybe when PCIe 4.0 drives first started appearing?

If you're running a recent release of FreeBSD (13.x) with the newer
versions of OpenZFS 2.x, then you shouldn't have any issues using NVMe
drives.  The hard part will be finding drives with MLC or 3D TLC NAND chips
in multiple channels, with a large SLC cache, and lots of RAM onboard using
good controllers, in order to get consistent, strong performance during
writes.  Especially when the drive is near full.  Too many drives are
moving to QLC NAND, or using DRAM-less controllers (using system RAM as a
buffer) in order to reduce the cost.  You'll want to do your research into
the technology used on the drive before buying any specific drive.

SATA SSDs will perform better than hard drives, but will be limited by the
SATA bus to around 550 MBps of read/write throughput.  NVMe drives will
provide multiple GBps of read/write throughput (depending on the drive and
PCIe bus version).  Finding a motherboard that supports more than 2 M.2
slots will be very hard.  If you want more than 2 drives, you'll have to
look into PCIe add-in boards with M.2 slots.  Really expensive ones will
include PCIe switches onboard so they'll work in pretty much any
motherboard with spare x16 slots (and maybe x8 slots, with reduced
performance?).  Less expensive add-in boards require PCIe bifurcation
support in the BIOS, and will only work in specific slots on the
motherboard.

My home ZFS server uses an ASUS motherboard with PCIe bifurcation support,
has an ASUS Hyper M.2 expansion card in the second PCIe x16 slot, with 2 WD
Blue M.2 SSDs installed (card supports 4 M.2 drives).  These are used to
create a root pool using a single mirror vdev.  /, /usr, and /var are
mounted from there.  There's 6x hard drives in a separate data pool using
multiple mirror vdevs, with /home mounted from there (this pool has been
migrated from IDE drives to SATA, from FreeBSD to Linux, and from raidz to
mirror vdevs at various points in the past, without losing any data so far;
yay ZFS!).

At work, all our ZFS servers use 2.5" SATA SSDs for the root pool, and for
separate L2ARC/SLOG devices, with 24-90 SATA hard drives for the storage
pool.  These are all running FreeBSD 13.x.

If you want the best performance, and money isn't a restriction, then
you'll want to look into servers that have U.2 (or whatever the next-gen
small form factor interface name is) slots and backplanes.  The drives cost
a lot more than regular M.2 SSDs, but provide a lot more performance.
Especially in AMD EPYC servers with 128 PCIe lanes to play with.  :)

-- 
Freddie Cash
fjwcash@gmail.com