Re: 214df80a9cb3 - main - nvme: new define for size of host memory buffer sizes

From: Rodney W. Grimes <freebsd_at_gndrsh.dnsmgr.net>
Date: Mon, 11 Apr 2022 17:13:52 UTC
A favor to ask.. this cutting of the "git: " string from the
commit message subject when replying cuases me to miss classify
mail messages.  It only occurs very rarely, so I have to assume
it is not an automatic thing.  Can this behavior be curtailed please?

Thanks,
Rod

[ Charset UTF-8 unsupported, converting... ]
> On Sat, Apr 9, 2022 at 5:15 PM Warner Losh <imp@bsdimp.com> wrote:
> 
> >
> >
> > On Sat, Apr 9, 2022 at 4:50 PM Ravi Pokala <rpokala@freebsd.org> wrote:
> >
> >> -----Original Message-----
> >> From: <owner-src-committers@freebsd.org> on behalf of Warner Losh
> >> <imp@FreeBSD.org>
> >> Date: 2022-04-08, Friday at 22:06
> >> To: <src-committers@FreeBSD.org>, <dev-commits-src-all@FreeBSD.org>,
> >> <dev-commits-src-main@FreeBSD.org>
> >> Subject: git: 214df80a9cb3 - main - nvme: new define for size of host
> >> memory buffer sizes
> >>
> >>     The branch main has been updated by imp:
> >>
> >>     URL:
> >> https://cgit.FreeBSD.org/src/commit/?id=214df80a9cb3e95a140b13af7d19deec2bbfae76
> >>
> >>     commit 214df80a9cb3e95a140b13af7d19deec2bbfae76
> >>     Author:     Warner Losh <imp@FreeBSD.org>
> >>     AuthorDate: 2022-04-09 05:01:06 +0000
> >>     Commit:     Warner Losh <imp@FreeBSD.org>
> >>     CommitDate: 2022-04-09 05:05:25 +0000
> >>
> >>         nvme: new define for size of host memory buffer sizes
> >>
> >>         The nvme spec defines the various fields that specify sizes for
> >> host
> >>         memory buffers in terms of 4096 chunks. So, rather than use a
> >> bare 4096
> >>         here, use NVME_HMB_UNITS. This is explicitly not the host page
> >> size of
> >>         4096, nor the default memory page size (mps) of the NVMe drive,
> >> but its
> >>         own thing and needs its own define.
> >>
> >> Hi Warner,
> >>
> >> Are you sure about that?
> >>
> >
> > NVMe-1.4, Figure 297: Host Memory Buffer ? Host Memory Buffer Descriptor
> >> Entry
> >>
> >> | Buffer Size (BSIZE): Indicates the number of contiguous
> >> | memory page size (CC.MPS) units for this descriptor.
> >> |
> >> | Buffer Address (BADD): Indicates the host memory address for
> >> | this descriptor aligned to the memory page size (CC.MPS).
> >> | The lower bits (n:0) of this field indicate the offset
> >> | within the memory page is 0h (e.g., if the memory page size
> >> | is 4 KiB, then bits 11:00 shall be 0h; if the memory page
> >> | size is 8 KiB, then bits 12:00 shall be 0h).
> >>
> >> They both reference mps, not 4096 bytes.
> >>
> >
> > So, some I'm 100% sure about. There's one that was previously incorrectly
> > hard wired to
> > 4k. More below.
> >
> > From Table 275 of Rev 2.0:
> >
> > Host Memory Buffer Preferred Size (HMPRE): This field indicates
> > the preferred size that the host is requested to allocate for the
> > Host Memory Buffer feature in 4 KiB units. This value shall be
> > greater than or equal to the Host Memory Buffer Minimum Size.
> > If this field is non-zero, then the Host Memory Buffer feature is
> > supported. If this field is cleared to 0h, then the Host Memory
> > Buffer feature is not supported.
> >
> > Host Memory Buffer Minimum Size (HMMIN): This field indicates
> > the minimum size that the host is requested to allocate for the
> > Host Memory Buffer feature in 4 KiB units. If this field is cleared
> > to 0h, then the host is requested to allocate any amount of host
> > memory possible up to the HMPRE value.
> >
> > Host Memory Buffer Minimum Descriptor Entry Size (HMMINDS):
> > This field indicates the minimum usable size of a Host Memory
> > Buffer Descriptor Entry in 4 KiB units. If this field is cleared to 0h,
> > then the controller does not indicate any limitations on the Host
> > Memory Buffer Descriptor Entry size.
> >
> > These are the hmmin, hmminds and hmpre fields of 'cdata' in the
> > driver.
> >
> >     diff --git a/sys/dev/nvme/nvme_ctrlr.c b/sys/dev/nvme/nvme_ctrlr.c
> >>     index 95a2b5c4285d..6996b3151b0d 100644
> >>     --- a/sys/dev/nvme/nvme_ctrlr.c
> >>     +++ b/sys/dev/nvme/nvme_ctrlr.c
> >>     @@ -936,11 +936,11 @@ nvme_ctrlr_hmb_alloc(struct nvme_controller
> >> *ctrlr)
> >>         max = (uint64_t)physmem * PAGE_SIZE / 20;
> >>         TUNABLE_UINT64_FETCH("hw.nvme.hmb_max", &max);
> >>
> >
> > max is a percent of available memory...
> >
> >
> >>     -   min = (long long unsigned)ctrlr->cdata.hmmin * 4096;
> >>     +   min = (long long unsigned)ctrlr->cdata.hmmin * NVME_HMB_UNITS;
> >>         if (max == 0 || max < min)
> >>                 return;
> >>     -   pref = MIN((long long unsigned)ctrlr->cdata.hmpre * 4096, max);
> >>     -   minc = MAX(ctrlr->cdata.hmminds * 4096, PAGE_SIZE);
> >>     +   pref = MIN((long long unsigned)ctrlr->cdata.hmpre *
> >> NVME_HMB_UNITS, max);
> >>     +   minc = MAX(ctrlr->cdata.hmminds * NVME_HMB_UNITS, PAGE_SIZE);
> >>
> >
> > So for all of these, we're good. They are in 4kiB chunks.
> >
> >
> >>         if (min > 0 && ctrlr->cdata.hmmaxd > 0)
> >>                 minc = MAX(minc, min / ctrlr->cdata.hmmaxd);
> >>         ctrlr->hmb_chunk = pref;
> >>     @@ -1023,7 +1023,7 @@ again:
> >>         for (i = 0; i < ctrlr->hmb_nchunks; i++) {
> >>                 ctrlr->hmb_desc_vaddr[i].addr =
> >>                     htole64(ctrlr->hmb_chunks[i].hmbc_paddr);
> >>     -           ctrlr->hmb_desc_vaddr[i].size = htole32(ctrlr->hmb_chunk
> >> / 4096);
> >>     +           ctrlr->hmb_desc_vaddr[i].size = htole32(ctrlr->hmb_chunk
> >> / NVME_HMB_UNITS);
> >>
> >
> > This one, you are correct is wrong. I'll fix it when I bring in the
> > changes to fully support
> > MPS != 0. For MPS == 0, which are the only drives this driver supports
> > correctly, this
> > is a nop. It was wrong before, though. The text you quoted is correct
> > about this. There
> > are a couple of PAGE_SIZEs tinbetween these two chunks hat should be
> > ctrlr->min_page_size instead (since that's the page size we're really
> > using, not the
> > host's). But PAGE_SIZE != 4k doesn't currently work with the nvme driver
> > due to
> > confusion like this.
> >
> 
> https://reviews.freebsd.org/D34871 should address that.
> 
> https://reviews.freebsd.org/D34865 through
> https://reviews.freebsd.org/D34873 are my current
> series. With them applied, I'm able to boot with 16k pages on a ZFS-based
> nvme arm64 system.
> I believe it would also allow us to use different drive page sizes too, but
> I haven't tried to figure
> that out :)...
> 
> Warner
> 
> 
> > We also currently set the MPS field in the CC incorrectly when it isn't 0.
> >
> > Good catch. I'll update my pending changes.
> >
> > Warner
> >
> >
> >>         }
> >>         bus_dmamap_sync(ctrlr->hmb_desc_tag, ctrlr->hmb_desc_map,
> >>             BUS_DMASYNC_PREWRITE);
> >>
> >>
> >>

-- 
Rod Grimes                                                 rgrimes@freebsd.org