Suggestions for working with unstable nvme dev names in AWS

Tue May 14 20:45:11 UTC 2019

On 5/14/2019 15:17, Matthias Oestreicher wrote:
> Am Dienstag, den 14.05.2019, 12:24 -0700 schrieb George Hartzell:
>> Polytropon writes:
>>  > On Tue, 14 May 2019 08:59:01 -0700, George Hartzell wrote:
>>  > > Matthew Seaman writes:
>>  > >  > [...] but if you
>>  > >  > are using ZFS, then shuffling the disks around should not make any
>>  > >  > difference. 
>>  > >  > [...]
>>  > > Yes, once I have them set up (ZFS or labeled), it doesn't matter what
>>  > > device names they end up having.  For now I just do the setup by hand,
>>  > > poking around a bit.  Same trick in the Linux world, you end up
>>  > > referring to them by their UUID or ....
>>  > 
>>  > In addition to what Matthew suggested, you could use UFS-IDs
>>  > in case the disks are initialized with UFS. You can find more
>>  > information here (at the bottom of the page):
>>  > [...]
>>
>> Yes.  As I mentioned in my response to Matthew, once I have some sort
>> of filesystem/zpool on the device, it's straightforward (TMTOWTDI).
>>
>> The problem is being able to provision the system automatically
>> without user intervention.
>>
>> In the Linux world, I can use e.g. Terraform to set up a pair of
>> additional volumes and tell it to call them `/dev/sdy` and `/dev/sdz`.
>> The Linux magic happens and I get pair of symlinks that I can use in
>> my e.g. Ansible playbooks, that point to whatever the devices came up
>> as when it booted.  I build filesystems on the devices, add them via
>> their UUID's to `/etc/fstab` and I'm off and running.
>>
>> I can't [seem to] do this in the FreeBSD world; even if I name the
>> devices `/dev/nvme1` (the fast and big one) and `/dev/nvme2` (the slow
>> and small one), there's no guarantee that they'll have those names
>> when the machine boots.
>>
>> This is a weirdly AWS issue and their peace offering is to stash the
>> requested device name in the device/controller/"hardware" and provide
>> a tool that digs it out.
>>
>> I'm trying to figure out what I can do about it from FreeBSD.  Perhaps
>> there's already a solution.  Perhaps the nvme driver needs to be
>> extended to provide access to the magic AWS info stash and then
>> something like Amazon Linux's `ebsnvme-id` can pry it out.
>>
>> g.
> Hei,
> I'm not familiar with Amazon's AWS, but if your only problem is shiftig device
> names for UFS filesystems, then on modern systems, GPT labels is the way to go.
> There has been a lot of confusion over the years, about the many ways to apply
> different types of labels to devices on FreeBSD, but really GEOM labels, UUIDs,
> etc, are only useful on old systems where there's no support for GPT.
>
> GPT labels are only applied to partitions, not whole drives, but they are extremely
> flexible. They can be applied and changed at any time, even on mounted filesystems.
> In comparison to GEOM labels and all other ID types, they will never be hidden if
> the devices original device name (like nvm0 or nvm1) is in use.
> At any time will 'gpart show -l' show the GPT labels you applied, and they can be
> used to manually mount and in /etc/fstab.
> I have never used any other labels for years and even disables all others in
>
> /boot/loader.conf
> kern.geom.label.disk_ident.enable=0
> kern.geom.label.gptid.enable=0
> kern.geom.label.ufsid.enable=0
>
> You can apply a GPT label with
> # gpart modify -l mylabel -i N /dev/nvm1
>
> and then add something like the following to /etc/fstab
> /dev/gpt/mylabel       /       ufs     rw      1       1
>
> There is only a single limitation with GPT labels and that is they don't work
> when you use UFS journaling via GEOM, as the GPT label will be the same for e.g
> /dev/nvm0p1 and /dev/nvm0p1.journal.
>
> Another big plus is, they work with every partition type, freebsd-ufs, freebsd-boot,
> swap, EFI, freebsd-zfs...
> One label type for everything can avoid some headache imo.
>
> Hope that clears up some confusion.
> Matthias
>
Uh, one possible warning on that.

They *do* disappear if you boot from an encrypted partition.

For example:

root at NewFS:/dev/gpt # zpool status zsr
  pool: zsr
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:04:17 with 0 errors on Mon May 13
03:24:33 2019
config:

        NAME            STATE     READ WRITE CKSUM
        zsr             ONLINE       0     0     0
          raidz2-0      ONLINE       0     0     0
            da2p4.eli   ONLINE       0     0     0
            da1p4.eli   ONLINE       0     0     0
            da11p4.eli  ONLINE       0     0     0
            da0p4.eli   ONLINE       0     0     0
            da3p4.eli   ONLINE       0     0     0

errors: No known data errors

root at NewFS:/dev/gpt # gpart show -l da2
=>       40  468862048  da2  GPT  (224G)
         40       1024    1  (null)  (512K)
       1064    1048576    2  (null)  (512M)
    1049640   10485760    3  swap1  (5.0G)
   11535400  457326688    4  ssd1  (218G)

You'd think /dev/gpt/ssd1 (and the rest) would be there.  Nope.

root at NewFS:/dev/gpt # ls
backup61        rust1.eli       rust4           swap1.eli       swap4
backup61.eli    rust2           rust4.eli       swap2           swap5
backup62-2      rust2.eli       rust5           swap2.eli
backup62-2.eli  rust3           rust5.eli       swap3
rust1           rust3.eli       swap1           swap3.eli
root at NewFS:/dev/gpt #

Note that the other two pools, plus all the swap partitions (three of
which I am using with automatic encryption) *do* show up.

I don't know if the system would in fact boot if I disabled all the
other label options; the loader finds the pool members via their
"native" (da-x) names however, and once it has them all mounted under
geli it boots from them -- and the labels do not show up under /dev/gpt.

My label settings....

root at NewFS:/dev/gpt # sysctl -a|grep kern.geom.label
kern.geom.label.disk_ident.enable: 1
kern.geom.label.gptid.enable: 0
kern.geom.label.gpt.enable: 1
kern.geom.label.ufs.enable: 1
kern.geom.label.ufsid.enable: 1
kern.geom.label.reiserfs.enable: 1
kern.geom.label.ntfs.enable: 1
kern.geom.label.msdosfs.enable: 1
kern.geom.label.iso9660.enable: 1
kern.geom.label.ext2fs.enable: 1
kern.geom.label.debug: 0

I don't know if the loader will properly find the pools if I was to turn
off disk_ident.enable -- never mind if I was to do that, and then wanted
to set up a *new* disk, how would I do it on the bare device if the disk
identifier can't be accessed?

-- 
Karl Denninger
karl at denninger.net <mailto:karl at denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4897 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20190514/b1bbfbd1/attachment.bin>