Re: git: b3d9e5013f3e - main - nvme: Don't active memory space until all BARs are configured
Date: Mon, 09 Mar 2026 13:39:57 UTC
On 3/6/26 12:32, Alexander Ziaee wrote:
> The branch main has been updated by ziaee:
>
> URL: https://cgit.FreeBSD.org/src/commit/?id=b3d9e5013f3e5016ffbd3d3d6091194658af2b92
>
> commit b3d9e5013f3e5016ffbd3d3d6091194658af2b92
> Author: Matt Delco <delco@google.com>
> AuthorDate: 2026-03-06 17:23:03 +0000
> Commit: Alexander Ziaee <ziaee@FreeBSD.org>
> CommitDate: 2026-03-06 17:28:41 +0000
>
> nvme: Don't active memory space until all BARs are configured
>
> In the current current behavior the 2nd and 3rd BARs can be activated
> when they're configured with address zero. This change defers the
> activation of all BARs until after they've all been configured with an
> address.
>
> This enables FreeBSD on Google Compute Engine C4-LSSD Machines.
This hack is fine for now, but the real fix for this is that the PCI bus driver
needs to ensure resources are reserved for all of the MEM or IO BARs before
enabling MEM or IO decoding at which point this can (and should) be reverted.
That's kind of a longstanding bug in the PCI bus driver that we haven't had to
solve previously because the firmware in most systems assigns BARs during POST
so that in practice we never have to deal with unreserved BARs.
> Sponsored by: Google
> Tested by: NetApp (previous version)
> Reviewed by: gallatin, imp
> Discussed with: jrtc27 (improved error reporting)
> Differential Revision: https://reviews.freebsd.org/D55541
> ---
> sys/dev/nvme/nvme_pci.c | 44 +++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 37 insertions(+), 7 deletions(-)
>
> diff --git a/sys/dev/nvme/nvme_pci.c b/sys/dev/nvme/nvme_pci.c
> index 5784c6d1be96..74191df52058 100644
> --- a/sys/dev/nvme/nvme_pci.c
> +++ b/sys/dev/nvme/nvme_pci.c
> @@ -151,24 +151,28 @@ nvme_pci_probe (device_t device)
> static int
> nvme_ctrlr_allocate_bar(struct nvme_controller *ctrlr)
> {
> + int error;
> +
> ctrlr->resource_id = PCIR_BAR(0);
> ctrlr->msix_table_resource_id = -1;
> ctrlr->msix_table_resource = NULL;
> ctrlr->msix_pba_resource_id = -1;
> ctrlr->msix_pba_resource = NULL;
>
> + /*
> + * Using RF_ACTIVE will set the Memory Space bit in the PCI command register.
> + * The remaining BARs will get mapped in before they've been programmed with
> + * an address. To avoid this we'll not set this flag and instead call
> + * bus_activate_resource() after all the BARs have been programmed.
> + */
> ctrlr->resource = bus_alloc_resource_any(ctrlr->dev, SYS_RES_MEMORY,
> - &ctrlr->resource_id, RF_ACTIVE);
> + &ctrlr->resource_id, 0);
>
> if (ctrlr->resource == NULL) {
> nvme_printf(ctrlr, "unable to allocate pci resource\n");
> return (ENOMEM);
> }
>
> - ctrlr->bus_tag = rman_get_bustag(ctrlr->resource);
> - ctrlr->bus_handle = rman_get_bushandle(ctrlr->resource);
> - ctrlr->regs = (struct nvme_registers *)ctrlr->bus_handle;
> -
> /*
> * The NVMe spec allows for the MSI-X tables to be placed behind
> * BAR 4 and/or 5, separate from the control/doorbell registers.
> @@ -180,7 +184,7 @@ nvme_ctrlr_allocate_bar(struct nvme_controller *ctrlr)
> if (ctrlr->msix_table_resource_id >= 0 &&
> ctrlr->msix_table_resource_id != ctrlr->resource_id) {
> ctrlr->msix_table_resource = bus_alloc_resource_any(ctrlr->dev,
> - SYS_RES_MEMORY, &ctrlr->msix_table_resource_id, RF_ACTIVE);
> + SYS_RES_MEMORY, &ctrlr->msix_table_resource_id, 0);
> if (ctrlr->msix_table_resource == NULL) {
> nvme_printf(ctrlr, "unable to allocate msi-x table resource\n");
> return (ENOMEM);
> @@ -190,13 +194,39 @@ nvme_ctrlr_allocate_bar(struct nvme_controller *ctrlr)
> ctrlr->msix_pba_resource_id != ctrlr->resource_id &&
> ctrlr->msix_pba_resource_id != ctrlr->msix_table_resource_id) {
> ctrlr->msix_pba_resource = bus_alloc_resource_any(ctrlr->dev,
> - SYS_RES_MEMORY, &ctrlr->msix_pba_resource_id, RF_ACTIVE);
> + SYS_RES_MEMORY, &ctrlr->msix_pba_resource_id, 0);
> if (ctrlr->msix_pba_resource == NULL) {
> nvme_printf(ctrlr, "unable to allocate msi-x pba resource\n");
> return (ENOMEM);
> }
> }
>
> + error = bus_activate_resource(ctrlr->dev, ctrlr->resource);
> + if (error) {
> + nvme_printf(ctrlr, "unable to activate pci resource: %d\n", error);
> + return (error);
> + }
> + if (ctrlr->msix_table_resource != NULL) {
> + error = bus_activate_resource(ctrlr->dev, ctrlr->msix_table_resource);
> + if (error) {
> + nvme_printf(ctrlr, "unable to activate msi-x table resource: %d\n",
> + error);
> + return (error);
> + }
> + }
> + if (ctrlr->msix_pba_resource != NULL) {
> + error = bus_activate_resource(ctrlr->dev, ctrlr->msix_pba_resource);
> + if (error) {
> + nvme_printf(ctrlr, "unable to activate msi-x pba resource: %d\n",
> + error);
> + return (error);
> + }
> + }
> +
> + ctrlr->bus_tag = rman_get_bustag(ctrlr->resource);
> + ctrlr->bus_handle = rman_get_bushandle(ctrlr->resource);
> + ctrlr->regs = (struct nvme_registers *)ctrlr->bus_handle;
This is a gross hack that needs to die (ctlr->regs), and also, bus_tag and bus_handle should
probably be removed as well.
--
John Baldwin