From nobody Sat Apr 09 23:15:37 2022 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id AB9CE1AABD0B for ; Sat, 9 Apr 2022 23:15:49 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-vs1-xe2e.google.com (mail-vs1-xe2e.google.com [IPv6:2607:f8b0:4864:20::e2e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KbWFJ6vDSz3Cf0 for ; Sat, 9 Apr 2022 23:15:48 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-vs1-xe2e.google.com with SMTP id i186so9954418vsc.9 for ; Sat, 09 Apr 2022 16:15:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=hLnoIR6KRzcXabxIVjyPugjNr75sj4Vg5d8ukEYqcaE=; b=u7VyDedWB170Le8PHJrYdhs04B+uhSt691ZXY+X6K/v4qK07moJ1l1CXfPFNS3m/qD YdZBRN7C/VwH/+cIUMfYj5YcAluSE2u7m6jeGwlLZLKCdi0FHjYlA0EUXs/neZCvxckr 0QmrZk84ZCfvC9cLhunsX/GVQmILyBkrL7S8FaCaGzzvcv1XvS5a6IPdK0Ot+h3cWMFb eTIWts2h7sUnXVSSN4EBUU8WpXhMZWoK7g1Ly1N+Mp4ElPfcsfWfBnv94XYOJjOpLVbD 36gfDQi880vRWYWq0IUAvw14W0dKz36xBd9I5ZgMzE0bUXamGwBQXff3ccjx8CMu1sHX YTSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=hLnoIR6KRzcXabxIVjyPugjNr75sj4Vg5d8ukEYqcaE=; b=fiRATADsd6gX0ZY6sXinhQNB8RB1VghQXB8RRQy0x1UBUbWRPbJY9cqMC1v/djZL4s TlONaxRkFbW+34+8QYNoSoJWwa4yRIl21xCZxoiKt9xLOpLVrnc/HaAPOhvVQlk7HOZO 1Sfcx6kLs5qQP1b+J6NwdyZGXgpMX9DHITgHv3bsm1NXWdzVaTJ9egxgEFv7r8jIjcOQ nSW9ldzBL89IBu1wkcCVgrqp6srgIzcSQNTEn/s0lEqfNQpZOQ+VdbvlSf+L2Mhx/3EI L0uLgSauy2pHhdmcDcyAjxXu0CvwQ3bMPu8yVLrEwE9TEP64CpX/a8zpDQlkyj9fh/4u A0eQ== X-Gm-Message-State: AOAM532E1RJs4LCP/p4kv9CIqqT8odnVWTo9AlNetwUDHLsYdjGoq6rl Iin81kzdajRu3I2YZq0uekO0AaZ/uFNHuXVsYQVykOFh/cfKWg/X X-Google-Smtp-Source: ABdhPJw+LktCjOWvrELxdlthGhqKawoobdy5zOFAlxduSPtHccZGejuYVa2xFLZ4cHWZh3t8suLO3ZXWKtKYodKLwcc= X-Received: by 2002:a67:db04:0:b0:328:2329:d4d5 with SMTP id z4-20020a67db04000000b003282329d4d5mr2762658vsj.77.1649546148172; Sat, 09 Apr 2022 16:15:48 -0700 (PDT) List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 References: <202204090506.239567Ag038413@gitrepo.freebsd.org> <3BC728DB-39BB-43BC-BECE-720FECB5B20D@panasas.com> In-Reply-To: <3BC728DB-39BB-43BC-BECE-720FECB5B20D@panasas.com> From: Warner Losh Date: Sat, 9 Apr 2022 17:15:37 -0600 Message-ID: Subject: Re: 214df80a9cb3 - main - nvme: new define for size of host memory buffer sizes To: Ravi Pokala Cc: Warner Losh , src-committers , "" , dev-commits-src-main@freebsd.org Content-Type: multipart/alternative; boundary="00000000000033e8e205dc40e436" X-Rspamd-Queue-Id: 4KbWFJ6vDSz3Cf0 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20210112.gappssmtp.com header.s=20210112 header.b=u7VyDedW; dmarc=none; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::e2e) smtp.mailfrom=wlosh@bsdimp.com X-Spamd-Result: default: False [-2.99 / 15.00]; RCVD_TLS_ALL(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20210112.gappssmtp.com:s=20210112]; NEURAL_HAM_MEDIUM(-0.99)[-0.987]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[dev-commits-src-main@freebsd.org]; DMARC_NA(0.00)[bsdimp.com]; RCPT_COUNT_FIVE(0.00)[5]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20210112.gappssmtp.com:+]; NEURAL_HAM_SHORT(-1.00)[-1.000]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::e2e:from]; MLMMJ_DEST(0.00)[dev-commits-src-main]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; R_SPF_NA(0.00)[no SPF record]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RCVD_COUNT_TWO(0.00)[2]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com] X-ThisMailContainsUnwantedMimeParts: N --00000000000033e8e205dc40e436 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Apr 9, 2022 at 4:50 PM Ravi Pokala wrote: > -----Original Message----- > From: on behalf of Warner Losh > > Date: 2022-04-08, Friday at 22:06 > To: , , > > Subject: git: 214df80a9cb3 - main - nvme: new define for size of host > memory buffer sizes > > The branch main has been updated by imp: > > URL: > https://cgit.FreeBSD.org/src/commit/?id=3D214df80a9cb3e95a140b13af7d19dee= c2bbfae76 > > commit 214df80a9cb3e95a140b13af7d19deec2bbfae76 > Author: Warner Losh > AuthorDate: 2022-04-09 05:01:06 +0000 > Commit: Warner Losh > CommitDate: 2022-04-09 05:05:25 +0000 > > nvme: new define for size of host memory buffer sizes > > The nvme spec defines the various fields that specify sizes for > host > memory buffers in terms of 4096 chunks. So, rather than use a bar= e > 4096 > here, use NVME_HMB_UNITS. This is explicitly not the host page > size of > 4096, nor the default memory page size (mps) of the NVMe drive, > but its > own thing and needs its own define. > > Hi Warner, > > Are you sure about that? > NVMe-1.4, Figure 297: Host Memory Buffer =E2=80=93 Host Memory Buffer Descr= iptor > Entry > > | Buffer Size (BSIZE): Indicates the number of contiguous > | memory page size (CC.MPS) units for this descriptor. > | > | Buffer Address (BADD): Indicates the host memory address for > | this descriptor aligned to the memory page size (CC.MPS). > | The lower bits (n:0) of this field indicate the offset > | within the memory page is 0h (e.g., if the memory page size > | is 4 KiB, then bits 11:00 shall be 0h; if the memory page > | size is 8 KiB, then bits 12:00 shall be 0h). > > They both reference mps, not 4096 bytes. > So, some I'm 100% sure about. There's one that was previously incorrectly hard wired to 4k. More below. From Table 275 of Rev 2.0: Host Memory Buffer Preferred Size (HMPRE): This field indicates the preferred size that the host is requested to allocate for the Host Memory Buffer feature in 4 KiB units. This value shall be greater than or equal to the Host Memory Buffer Minimum Size. If this field is non-zero, then the Host Memory Buffer feature is supported. If this field is cleared to 0h, then the Host Memory Buffer feature is not supported. Host Memory Buffer Minimum Size (HMMIN): This field indicates the minimum size that the host is requested to allocate for the Host Memory Buffer feature in 4 KiB units. If this field is cleared to 0h, then the host is requested to allocate any amount of host memory possible up to the HMPRE value. Host Memory Buffer Minimum Descriptor Entry Size (HMMINDS): This field indicates the minimum usable size of a Host Memory Buffer Descriptor Entry in 4 KiB units. If this field is cleared to 0h, then the controller does not indicate any limitations on the Host Memory Buffer Descriptor Entry size. These are the hmmin, hmminds and hmpre fields of 'cdata' in the driver. diff --git a/sys/dev/nvme/nvme_ctrlr.c b/sys/dev/nvme/nvme_ctrlr.c > index 95a2b5c4285d..6996b3151b0d 100644 > --- a/sys/dev/nvme/nvme_ctrlr.c > +++ b/sys/dev/nvme/nvme_ctrlr.c > @@ -936,11 +936,11 @@ nvme_ctrlr_hmb_alloc(struct nvme_controller > *ctrlr) > max =3D (uint64_t)physmem * PAGE_SIZE / 20; > TUNABLE_UINT64_FETCH("hw.nvme.hmb_max", &max); > max is a percent of available memory... > - min =3D (long long unsigned)ctrlr->cdata.hmmin * 4096; > + min =3D (long long unsigned)ctrlr->cdata.hmmin * NVME_HMB_UNITS; > if (max =3D=3D 0 || max < min) > return; > - pref =3D MIN((long long unsigned)ctrlr->cdata.hmpre * 4096, max); > - minc =3D MAX(ctrlr->cdata.hmminds * 4096, PAGE_SIZE); > + pref =3D MIN((long long unsigned)ctrlr->cdata.hmpre * > NVME_HMB_UNITS, max); > + minc =3D MAX(ctrlr->cdata.hmminds * NVME_HMB_UNITS, PAGE_SIZE); > So for all of these, we're good. They are in 4kiB chunks. > if (min > 0 && ctrlr->cdata.hmmaxd > 0) > minc =3D MAX(minc, min / ctrlr->cdata.hmmaxd); > ctrlr->hmb_chunk =3D pref; > @@ -1023,7 +1023,7 @@ again: > for (i =3D 0; i < ctrlr->hmb_nchunks; i++) { > ctrlr->hmb_desc_vaddr[i].addr =3D > htole64(ctrlr->hmb_chunks[i].hmbc_paddr); > - ctrlr->hmb_desc_vaddr[i].size =3D htole32(ctrlr->hmb_chun= k / > 4096); > + ctrlr->hmb_desc_vaddr[i].size =3D htole32(ctrlr->hmb_chun= k / > NVME_HMB_UNITS); > This one, you are correct is wrong. I'll fix it when I bring in the changes to fully support MPS !=3D 0. For MPS =3D=3D 0, which are the only drives this driver support= s correctly, this is a nop. It was wrong before, though. The text you quoted is correct about this. There are a couple of PAGE_SIZEs tinbetween these two chunks hat should be ctrlr->min_page_size instead (since that's the page size we're really using, not the host's). But PAGE_SIZE !=3D 4k doesn't currently work with the nvme driver due to confusion like this. We also currently set the MPS field in the CC incorrectly when it isn't 0. Good catch. I'll update my pending changes. Warner > } > bus_dmamap_sync(ctrlr->hmb_desc_tag, ctrlr->hmb_desc_map, > BUS_DMASYNC_PREWRITE); > > > --00000000000033e8e205dc40e436 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Sat, Apr 9, 2022 at 4:50 PM Ravi P= okala <rpokala@freebsd.org>= ; wrote:
-----Or= iginal Message-----
From: <owner-src-committers@freebsd.org> on behalf of Warner Losh <= imp@FreeBSD.org>
Date: 2022-04-08, Friday at 22:06
To: <src-committers@FreeBSD.org>, <dev-commits-src-all@FreeBSD.org= >, <dev-commits-src-main@FreeBSD.org>
Subject: git: 214df80a9cb3 - main - nvme: new define for size of host memor= y buffer sizes

=C2=A0 =C2=A0 The branch main has been updated by imp:

=C2=A0 =C2=A0 URL: https://cgit.FreeBSD.org/src/commit/?id=3D214df80a9cb3e95a140b13af7d19deec= 2bbfae76

=C2=A0 =C2=A0 commit 214df80a9cb3e95a140b13af7d19deec2bbfae76
=C2=A0 =C2=A0 Author:=C2=A0 =C2=A0 =C2=A0Warner Losh <imp@FreeBSD.org>= ;
=C2=A0 =C2=A0 AuthorDate: 2022-04-09 05:01:06 +0000
=C2=A0 =C2=A0 Commit:=C2=A0 =C2=A0 =C2=A0Warner Losh <imp@FreeBSD.org>= ;
=C2=A0 =C2=A0 CommitDate: 2022-04-09 05:05:25 +0000

=C2=A0 =C2=A0 =C2=A0 =C2=A0 nvme: new define for size of host memory buffer= sizes

=C2=A0 =C2=A0 =C2=A0 =C2=A0 The nvme spec defines the various fields that s= pecify sizes for host
=C2=A0 =C2=A0 =C2=A0 =C2=A0 memory buffers in terms of 4096 chunks. So, rat= her than use a bare 4096
=C2=A0 =C2=A0 =C2=A0 =C2=A0 here, use NVME_HMB_UNITS. This is explicitly no= t the host page size of
=C2=A0 =C2=A0 =C2=A0 =C2=A0 4096, nor the default memory page size (mps) of= the NVMe drive, but its
=C2=A0 =C2=A0 =C2=A0 =C2=A0 own thing and needs its own define.

Hi Warner,

Are you sure about that?

NVMe-1.4, Figure 297: Host Memory Buffer = =E2=80=93 Host Memory Buffer Descriptor Entry

| Buffer Size (BSIZE): Indicates the number of contiguous
| memory page size (CC.MPS) units for this descriptor.
|
| Buffer Address (BADD): Indicates the host memory address for
| this descriptor aligned to the memory page size (CC.MPS).
| The lower bits (n:0) of this field indicate the offset
| within the memory page is 0h (e.g., if the memory page size
| is 4 KiB, then bits 11:00 shall be 0h; if the memory page
| size is 8 KiB, then bits 12:00 shall be 0h).

They both reference mps, not 4096 bytes.

So, some I'm 100% sure about. There's one that was previously inc= orrectly hard wired to
4k. More below.

F= rom Table 275 of Rev 2.0:

Host Memory Buffer Prefe= rred Size (HMPRE): This field indicates
the preferred size that t= he host is requested to allocate for the
Host Memory Buffer featu= re in 4 KiB units. This value shall be
greater than or equal to t= he Host Memory Buffer Minimum Size.
If this field is non-zero, th= en the Host Memory Buffer feature is
supported. If this field is = cleared to 0h, then the Host Memory
Buffer feature is not support= ed.

Host Memory Buffer Minimum Size (HMMIN): T= his field indicates
the minimum size that the host is requested t= o allocate for the
Host Memory Buffer feature in 4 KiB units. If = this field is cleared
to 0h, then the host is requested to alloca= te any amount of host
memory possible up to the HMPRE value.
<= /div>

Host Memory Buffer Minimum Descriptor Entry Size (= HMMINDS):
This field indicates the minimum usable size of a Host = Memory
Buffer Descriptor Entry in 4 KiB units. If this field is c= leared to 0h,
then the controller does not indicate any limitatio= ns on the Host
Memory Buffer Descriptor Entry size.

These are the hmmin, hmminds and hmpre fields of 'cdata= ' in the
driver.

=C2=A0 =C2=A0 diff --git a/sys/dev/nvme/nvme_ctrl= r.c b/sys/dev/nvme/nvme_ctrlr.c
=C2=A0 =C2=A0 index 95a2b5c4285d..6996b3151b0d 100644
=C2=A0 =C2=A0 --- a/sys/dev/nvme/nvme_ctrlr.c
=C2=A0 =C2=A0 +++ b/sys/dev/nvme/nvme_ctrlr.c
=C2=A0 =C2=A0 @@ -936,11 +936,11 @@ nvme_ctrlr_hmb_alloc(struct nvme_contro= ller *ctrlr)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max =3D (uint64_t)physmem * PAGE_SIZE / 20;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 TUNABLE_UINT64_FETCH("hw.nvme.hmb_max"= ;, &max);

max is a percent of avail= able memory...
=C2=A0
=C2=A0 =C2=A0 -=C2=A0 =C2=A0min =3D (long long unsigned)ctrlr->cdata.hmm= in * 4096;
=C2=A0 =C2=A0 +=C2=A0 =C2=A0min =3D (long long unsigned)ctrlr->cdata.hmm= in * NVME_HMB_UNITS;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (max =3D=3D 0 || max < min)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return;
=C2=A0 =C2=A0 -=C2=A0 =C2=A0pref =3D MIN((long long unsigned)ctrlr->cdat= a.hmpre * 4096, max);
=C2=A0 =C2=A0 -=C2=A0 =C2=A0minc =3D MAX(ctrlr->cdata.hmminds * 4096, PA= GE_SIZE);
=C2=A0 =C2=A0 +=C2=A0 =C2=A0pref =3D MIN((long long unsigned)ctrlr->cdat= a.hmpre * NVME_HMB_UNITS, max);
=C2=A0 =C2=A0 +=C2=A0 =C2=A0minc =3D MAX(ctrlr->cdata.hmminds * NVME_HMB= _UNITS, PAGE_SIZE);

So for all of these= , we're good. They are in 4kiB chunks.
=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (min > 0 && ctrlr->cdata.hmmax= d > 0)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 minc =3D MAX(minc, = min / ctrlr->cdata.hmmaxd);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 ctrlr->hmb_chunk =3D pref;
=C2=A0 =C2=A0 @@ -1023,7 +1023,7 @@ again:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 for (i =3D 0; i < ctrlr->hmb_nchunks; i++= ) {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ctrlr->hmb_desc_= vaddr[i].addr =3D
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 htole= 64(ctrlr->hmb_chunks[i].hmbc_paddr);
=C2=A0 =C2=A0 -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ctrlr->hmb_desc_= vaddr[i].size =3D htole32(ctrlr->hmb_chunk / 4096);
=C2=A0 =C2=A0 +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ctrlr->hmb_desc_= vaddr[i].size =3D htole32(ctrlr->hmb_chunk / NVME_HMB_UNITS);

This one, you are correct is wrong. I'll fix = it when I bring in the changes to fully support
MPS !=3D 0. For M= PS =3D=3D 0, which are the only drives this driver supports correctly, this=
is a nop. It was wrong before, though. The text you quoted=C2=A0= is correct about this. There
are a couple of PAGE_SIZEs tinbetwee= n=C2=A0these two chunks hat should be
ctrlr->min_page_size ins= tead (since that's the page size we're really using, not the
<= div>host's). But PAGE_SIZE !=3D 4k doesn't currently work with the = nvme driver due to
confusion like this.

= We also currently set the MPS field in the CC incorrectly when it isn't= 0.

Good catch. I'll update my pending changes= .

Warner
=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
=C2=A0 =C2=A0 =C2=A0 =C2=A0 bus_dmamap_sync(ctrlr->hmb_desc_tag, ctrlr-&= gt;hmb_desc_map,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 BUS_DMASYNC_PREWRITE);


--00000000000033e8e205dc40e436--