Suggestions for working with unstable nvme dev names in AWS

Wed May 15 07:59:26 UTC 2019

Am Dienstag, den 14.05.2019, 15:45 -0500 schrieb Karl Denninger:
> On 5/14/2019 15:17, Matthias Oestreicher wrote:
> > Am Dienstag, den 14.05.2019, 12:24 -0700 schrieb George Hartzell:
> > > Polytropon writes:
> > >  > On Tue, 14 May 2019 08:59:01 -0700, George Hartzell wrote:
> > >  > > Matthew Seaman writes:
> > >  > >  > [...] but if you
> > >  > >  > are using ZFS, then shuffling the disks around should not make any
> > >  > >  > difference. 
> > >  > >  > [...]
> > >  > > Yes, once I have them set up (ZFS or labeled), it doesn't matter what
> > >  > > device names they end up having.  For now I just do the setup by hand,
> > >  > > poking around a bit.  Same trick in the Linux world, you end up
> > >  > > referring to them by their UUID or ....
> > >  > 
> > >  > In addition to what Matthew suggested, you could use UFS-IDs
> > >  > in case the disks are initialized with UFS. You can find more
> > >  > information here (at the bottom of the page):
> > >  > [...]
> > > 
> > > Yes.  As I mentioned in my response to Matthew, once I have some sort
> > > of filesystem/zpool on the device, it's straightforward (TMTOWTDI).
> > > 
> > > The problem is being able to provision the system automatically
> > > without user intervention.
> > > 
> > > In the Linux world, I can use e.g. Terraform to set up a pair of
> > > additional volumes and tell it to call them `/dev/sdy` and `/dev/sdz`.
> > > The Linux magic happens and I get pair of symlinks that I can use in
> > > my e.g. Ansible playbooks, that point to whatever the devices came up
> > > as when it booted.  I build filesystems on the devices, add them via
> > > their UUID's to `/etc/fstab` and I'm off and running.
> > > 
> > > I can't [seem to] do this in the FreeBSD world; even if I name the
> > > devices `/dev/nvme1` (the fast and big one) and `/dev/nvme2` (the slow
> > > and small one), there's no guarantee that they'll have those names
> > > when the machine boots.
> > > 
> > > This is a weirdly AWS issue and their peace offering is to stash the
> > > requested device name in the device/controller/"hardware" and provide
> > > a tool that digs it out.
> > > 
> > > I'm trying to figure out what I can do about it from FreeBSD.  Perhaps
> > > there's already a solution.  Perhaps the nvme driver needs to be
> > > extended to provide access to the magic AWS info stash and then
> > > something like Amazon Linux's `ebsnvme-id` can pry it out.
> > > 
> > > g.
> > 
> > Hei,
> > I'm not familiar with Amazon's AWS, but if your only problem is shiftig device
> > names for UFS filesystems, then on modern systems, GPT labels is the way to go.
> > There has been a lot of confusion over the years, about the many ways to apply
> > different types of labels to devices on FreeBSD, but really GEOM labels, UUIDs,
> > etc, are only useful on old systems where there's no support for GPT.
> > 
> > GPT labels are only applied to partitions, not whole drives, but they are extremely
> > flexible. They can be applied and changed at any time, even on mounted filesystems.
> > In comparison to GEOM labels and all other ID types, they will never be hidden if
> > the devices original device name (like nvm0 or nvm1) is in use.
> > At any time will 'gpart show -l' show the GPT labels you applied, and they can be
> > used to manually mount and in /etc/fstab.
> > I have never used any other labels for years and even disables all others in
> > 
> > /boot/loader.conf
> > kern.geom.label.disk_ident.enable=0
> > kern.geom.label.gptid.enable=0
> > kern.geom.label.ufsid.enable=0
> > 
> > You can apply a GPT label with
> > # gpart modify -l mylabel -i N /dev/nvm1
> > 
> > and then add something like the following to /etc/fstab
> > /dev/gpt/mylabel       /       ufs     rw      1       1
> > 
> > There is only a single limitation with GPT labels and that is they don't work
> > when you use UFS journaling via GEOM, as the GPT label will be the same for e.g
> > /dev/nvm0p1 and /dev/nvm0p1.journal.
> > 
> > Another big plus is, they work with every partition type, freebsd-ufs, freebsd-
> > boot,
> > swap, EFI, freebsd-zfs...
> > One label type for everything can avoid some headache imo.
> > 
> > Hope that clears up some confusion.
> > Matthias
> > 
> 
> Uh, one possible warning on that.
> 
> They *do* disappear if you boot from an encrypted partition.
> 
> For example:
> 
> root at NewFS:/dev/gpt # zpool status zsr
>   pool: zsr
>  state: ONLINE
>   scan: scrub repaired 0 in 0 days 00:04:17 with 0 errors on Mon May 13
> 03:24:33 2019
> config:
> 
>         NAME            STATE     READ WRITE CKSUM
>         zsr             ONLINE       0     0     0
>           raidz2-0      ONLINE       0     0     0
>             da2p4.eli   ONLINE       0     0     0
>             da1p4.eli   ONLINE       0     0     0
>             da11p4.eli  ONLINE       0     0     0
>             da0p4.eli   ONLINE       0     0     0
>             da3p4.eli   ONLINE       0     0     0
> 
> errors: No known data errors
> 
> root at NewFS:/dev/gpt # gpart show -l da2
> =>       40  468862048  da2  GPT  (224G)
>          40       1024    1  (null)  (512K)
>        1064    1048576    2  (null)  (512M)
>     1049640   10485760    3  swap1  (5.0G)
>    11535400  457326688    4  ssd1  (218G)
> 
> You'd think /dev/gpt/ssd1 (and the rest) would be there.  Nope.
> 
> root at NewFS:/dev/gpt # ls
> backup61        rust1.eli       rust4           swap1.eli       swap4
> backup61.eli    rust2           rust4.eli       swap2           swap5
> backup62-2      rust2.eli       rust5           swap2.eli
> backup62-2.eli  rust3           rust5.eli       swap3
> rust1           rust3.eli       swap1           swap3.eli
> root at NewFS:/dev/gpt #
> 
> Note that the other two pools, plus all the swap partitions (three of
> which I am using with automatic encryption) *do* show up.
> 
> I don't know if the system would in fact boot if I disabled all the
> other label options; the loader finds the pool members via their
> "native" (da-x) names however, and once it has them all mounted under
> geli it boots from them -- and the labels do not show up under /dev/gpt.
> 
> My label settings....
> 
> root at NewFS:/dev/gpt # sysctl -a|grep kern.geom.label
> kern.geom.label.disk_ident.enable: 1
> kern.geom.label.gptid.enable: 0
> kern.geom.label.gpt.enable: 1
> kern.geom.label.ufs.enable: 1
> kern.geom.label.ufsid.enable: 1
> kern.geom.label.reiserfs.enable: 1
> kern.geom.label.ntfs.enable: 1
> kern.geom.label.msdosfs.enable: 1
> kern.geom.label.iso9660.enable: 1
> kern.geom.label.ext2fs.enable: 1
> kern.geom.label.debug: 0
> 
> I don't know if the loader will properly find the pools if I was to turn
> off disk_ident.enable -- never mind if I was to do that, and then wanted
> to set up a *new* disk, how would I do it on the bare device if the disk
> identifier can't be accessed?
> 
> -- 
> Karl Denninger
> karl at denninger.net <mailto:karl at denninger.net>
> /The Market Ticker/
> /[S/MIME encrypted email preferred]/
Hei Karl,

I've never used encryption. I use GPT labels for my ZFS pool providers as well.

The problem with the .eli extension seems to be the same as with .journal extensions,
when using UFS journaling via GEOM.
The GPT label will apply to both, the underlying e.g. /dev/ada0p2, as well as /dev/ada0p2.eli
or /dev/ada0p2.journal. Thus a GPT label is not suitable for GEOM journaling and obviously not
for GELI either.
I didn't know that, so thanks for the info, that GELI actually is able to hide the GPT labels.

I guess the GPT labels wouldn't be hidden, if they could be used as ZFS provider names,
but then we're back to the point, where the .geli (and .journal) extensions come into play, that
prevent doing so. 

Regards
Matthias