Re: git: a8089ea5aee5 - main - nvmfd: A simple userspace daemon for the NVMe over Fabrics controller
Date: Fri, 03 May 2024 19:22:32 UTC
On 5/2/24 5:16 PM, John Baldwin wrote: > The branch main has been updated by jhb: > > URL: https://cgit.FreeBSD.org/src/commit/?id=a8089ea5aee578e08acab2438e82fc9a9ae50ed8 > > commit a8089ea5aee578e08acab2438e82fc9a9ae50ed8 > Author: John Baldwin <jhb@FreeBSD.org> > AuthorDate: 2024-05-02 23:35:40 +0000 > Commit: John Baldwin <jhb@FreeBSD.org> > CommitDate: 2024-05-02 23:38:39 +0000 > > nvmfd: A simple userspace daemon for the NVMe over Fabrics controller I'm sure there are some subtle bugs I've missed somewhere, but I have tested the host and controller against each other (both userspace and kernel) as well as against Linux. Some of the patches Warner approved in Phab he specifically noted as being looked over, but not in detail due to the size, etc. I kind of think we might want a separate tag for those types of reviews. In GDB, we use an 'Acked-by' tag to mean that a commit is approved, but it's not had a detailed technical review the way 'Reviewed-by' implies. If we had such a tag here, some of these commits probably would have used Acked-by instead of Reviewed by. Here are some initial notes on using NVMeoF. They might be a good candidate for the handbook. (If we don't yet have notes on iSCSI for the handbook we should add those as well, maybe I will get around to that as well): # Overview NVMe over Fabrics supports access to remote block storage devices as NVMe namespaces across a network connection similar to using iSCSI to acccess remote block storage devices as SCSI LUNs. FreeBSD includes support for accessing remote namespaces via a host driver as well as support for exporting local storage devices as namespaces to remote hosts. NVMe over Fabrics supports multiple transport layers including FibreChannel, RDMA (over both iWARP and ROCE) and TCP. FreeBSD only includes support for the TCP transport currently. Enabling support requires loading a kernel module for the transport to use in addition to the host or controller module. The TCP transport is provided by `nvmf_tcp.ko`. # Host (Initiator) The fabrics host on FreeBSD exposes remote controllers as `nvmeX` new-bus devices similar to PCI-express NVMe controllers. Remote namespaces are exposed via `ndaX` disk devices via CAM. The fabrics host driver does not support the `nvd` disk driver. ## Discovery Service NVMe over Fabrics defines a discovery service. A discovery controller exports a log page enumerating a set of one or more controllers. Each log page entry contains the type of a controller (I/O or discovery) as well as the transport type and transport-specific address. For the TCP transport the address includes the IP address and TCP port number. nvmecontrol(8) supports a `discover` command to query the log page from a discovery controller. Example 1: The Discovery Log Page from a Linux Controller ``` # nvmecontrol discover ubuntu:4420 Discovery ========= Entry 01 ======== Transport type: TCP Address family: AF_INET Subsystem type: NVMe SQ flow control: optional Secure Channel: Not specified Port ID: 1 Controller ID: Dynamic Max Admin SQ Size: 32 Sub NQN: nvme-test-target Transport address: 10.0.0.118 Service identifier: 4420 Security Type: None ``` ## Connecting To an I/O Controller nvmecontrol(8) supports `connect` command to establish an association with a remote controller. Once the association is established, it is handed off to the in-kernel host which creates a new `nvmeX` device. Example 2: Connecting to an I/O Controller ``` # kldload nvmf nvmf_tcp # nvmecontrol connect ubuntu:4420 nvme-test-target ``` This results in the following lines in dmesg: ``` nvme0: <Fabrics: nvme-test-target> nda0 at nvme0 bus 0 scbus0 target 0 lun 1 nda0: <Linux 5.15.0-8 843bf4f791f9cdb03d8b> nda0: Serial Number 843bf4f791f9cdb03d8b nda0: nvme version 1.3 nda0: 1024MB (2097152 512 byte sectors) ``` The new `nvme0` device can now be used with other nvmecontrol(8) commands such as `identify` similar to PCI-express controllers. Example 3: Identify a Remote I/O Controller ``` # nvmecontrol identify nvme0 Controller Capabilities/Features ================================ ... Model Number: Linux Firmware Version: 5.15.0-8 ... Fabrics Attributes ================== I/O Command Capsule Size: 16448 bytes I/O Response Capsule Size: 16 bytes In Capsule Data Offset: 0 bytes Controller Model: Dynamic Max SGL Descriptors: 1 Disconnect of I/O Queues: Not Supported ``` The `nda0` disk device can be used like any other NVMe disk device. ## Connecting via Discovery nvmecontrol(8)'s `connect-all` command fetches the discovery log page from the specified discovery controller and creates an association for each log page entry. ## Disconnecting nvmecontrol(8)'s `disconnect` command detaches the namespaces from a remote controller and destroys the association. Example 4: Disconnecting From a Remote I/O Controller ``` # nvmecontrol disconnect nvme0 ``` The `disconnect-all` command destroys associations with all remote controllers. ## Reconnecting If a connection is interrupted (for example, TCP connections die), the association is torn down (all queues are disconnected), but the `nvmeX` device is left in a quiesced state. Any pending I/O requests for remote namespaces are left pending as well. In this state, the `reconnect` command can be used to establish a new association to resume operation with a remote controller. Example 5: Reconnecting to a Remote I/O Controller ``` # nvmecontrol reconnect nvme0 ubuntu:4420 nvme-test-target ``` # Controller (Target) The fabrics controller on FreeBSD exposes local block devices as NVMe namespaces to remote hosts. The controller support on FreeBSD includes a userland implementation of a discovery controller as well as an in-kernel I/O controller. Similar to the existing iSCSI target in FreeBSD, the in-kernel I/O controller uses CAM's target layer (ctl(4)). Block devices are created by adding ctl(4) LUNs via ctladm(8). The discovery service and initial handling of I/O controller connections is managed by the nvmfd(8) daemon. Example 6: Exporting a local ZFS Volume ``` # kldload nvmft nvmf_tcp # ctladm create -b block -o file=/dev/zvol/bhyve/iscsi LUN created successfully backend: block device type: 0 LUN size: 4294967296 bytes blocksize 512 bytes LUN ID: 0 Serial Number: MYSERIAL0000 Device ID: MYDEVID0000 # nvmfd -F -p 4420 -n nqn.2001-03.com.chelsio:frodo0 -K ``` Open associations can be listed via `ctladm nvlist` and can be disconnected via `ctladmin nvterminate`. Eventually NVMe support should be added to ctld(8) by merging nvmfd(8) into ctld(8). -- John Baldwin