Suggestions for working with unstable nvme dev names in AWS
Karl Denninger
karl at denninger.net
Tue May 14 20:45:11 UTC 2019
On 5/14/2019 15:17, Matthias Oestreicher wrote:
> Am Dienstag, den 14.05.2019, 12:24 -0700 schrieb George Hartzell:
>> Polytropon writes:
>> > On Tue, 14 May 2019 08:59:01 -0700, George Hartzell wrote:
>> > > Matthew Seaman writes:
>> > > > [...] but if you
>> > > > are using ZFS, then shuffling the disks around should not make any
>> > > > difference.
>> > > > [...]
>> > > Yes, once I have them set up (ZFS or labeled), it doesn't matter what
>> > > device names they end up having. For now I just do the setup by hand,
>> > > poking around a bit. Same trick in the Linux world, you end up
>> > > referring to them by their UUID or ....
>> >
>> > In addition to what Matthew suggested, you could use UFS-IDs
>> > in case the disks are initialized with UFS. You can find more
>> > information here (at the bottom of the page):
>> > [...]
>>
>> Yes. As I mentioned in my response to Matthew, once I have some sort
>> of filesystem/zpool on the device, it's straightforward (TMTOWTDI).
>>
>> The problem is being able to provision the system automatically
>> without user intervention.
>>
>> In the Linux world, I can use e.g. Terraform to set up a pair of
>> additional volumes and tell it to call them `/dev/sdy` and `/dev/sdz`.
>> The Linux magic happens and I get pair of symlinks that I can use in
>> my e.g. Ansible playbooks, that point to whatever the devices came up
>> as when it booted. I build filesystems on the devices, add them via
>> their UUID's to `/etc/fstab` and I'm off and running.
>>
>> I can't [seem to] do this in the FreeBSD world; even if I name the
>> devices `/dev/nvme1` (the fast and big one) and `/dev/nvme2` (the slow
>> and small one), there's no guarantee that they'll have those names
>> when the machine boots.
>>
>> This is a weirdly AWS issue and their peace offering is to stash the
>> requested device name in the device/controller/"hardware" and provide
>> a tool that digs it out.
>>
>> I'm trying to figure out what I can do about it from FreeBSD. Perhaps
>> there's already a solution. Perhaps the nvme driver needs to be
>> extended to provide access to the magic AWS info stash and then
>> something like Amazon Linux's `ebsnvme-id` can pry it out.
>>
>> g.
> Hei,
> I'm not familiar with Amazon's AWS, but if your only problem is shiftig device
> names for UFS filesystems, then on modern systems, GPT labels is the way to go.
> There has been a lot of confusion over the years, about the many ways to apply
> different types of labels to devices on FreeBSD, but really GEOM labels, UUIDs,
> etc, are only useful on old systems where there's no support for GPT.
>
> GPT labels are only applied to partitions, not whole drives, but they are extremely
> flexible. They can be applied and changed at any time, even on mounted filesystems.
> In comparison to GEOM labels and all other ID types, they will never be hidden if
> the devices original device name (like nvm0 or nvm1) is in use.
> At any time will 'gpart show -l' show the GPT labels you applied, and they can be
> used to manually mount and in /etc/fstab.
> I have never used any other labels for years and even disables all others in
>
> /boot/loader.conf
> kern.geom.label.disk_ident.enable=0
> kern.geom.label.gptid.enable=0
> kern.geom.label.ufsid.enable=0
>
> You can apply a GPT label with
> # gpart modify -l mylabel -i N /dev/nvm1
>
> and then add something like the following to /etc/fstab
> /dev/gpt/mylabel / ufs rw 1 1
>
> There is only a single limitation with GPT labels and that is they don't work
> when you use UFS journaling via GEOM, as the GPT label will be the same for e.g
> /dev/nvm0p1 and /dev/nvm0p1.journal.
>
> Another big plus is, they work with every partition type, freebsd-ufs, freebsd-boot,
> swap, EFI, freebsd-zfs...
> One label type for everything can avoid some headache imo.
>
> Hope that clears up some confusion.
> Matthias
>
Uh, one possible warning on that.
They *do* disappear if you boot from an encrypted partition.
For example:
root at NewFS:/dev/gpt # zpool status zsr
pool: zsr
state: ONLINE
scan: scrub repaired 0 in 0 days 00:04:17 with 0 errors on Mon May 13
03:24:33 2019
config:
NAME STATE READ WRITE CKSUM
zsr ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
da2p4.eli ONLINE 0 0 0
da1p4.eli ONLINE 0 0 0
da11p4.eli ONLINE 0 0 0
da0p4.eli ONLINE 0 0 0
da3p4.eli ONLINE 0 0 0
errors: No known data errors
root at NewFS:/dev/gpt # gpart show -l da2
=> 40 468862048 da2 GPT (224G)
40 1024 1 (null) (512K)
1064 1048576 2 (null) (512M)
1049640 10485760 3 swap1 (5.0G)
11535400 457326688 4 ssd1 (218G)
You'd think /dev/gpt/ssd1 (and the rest) would be there. Nope.
root at NewFS:/dev/gpt # ls
backup61 rust1.eli rust4 swap1.eli swap4
backup61.eli rust2 rust4.eli swap2 swap5
backup62-2 rust2.eli rust5 swap2.eli
backup62-2.eli rust3 rust5.eli swap3
rust1 rust3.eli swap1 swap3.eli
root at NewFS:/dev/gpt #
Note that the other two pools, plus all the swap partitions (three of
which I am using with automatic encryption) *do* show up.
I don't know if the system would in fact boot if I disabled all the
other label options; the loader finds the pool members via their
"native" (da-x) names however, and once it has them all mounted under
geli it boots from them -- and the labels do not show up under /dev/gpt.
My label settings....
root at NewFS:/dev/gpt # sysctl -a|grep kern.geom.label
kern.geom.label.disk_ident.enable: 1
kern.geom.label.gptid.enable: 0
kern.geom.label.gpt.enable: 1
kern.geom.label.ufs.enable: 1
kern.geom.label.ufsid.enable: 1
kern.geom.label.reiserfs.enable: 1
kern.geom.label.ntfs.enable: 1
kern.geom.label.msdosfs.enable: 1
kern.geom.label.iso9660.enable: 1
kern.geom.label.ext2fs.enable: 1
kern.geom.label.debug: 0
I don't know if the loader will properly find the pools if I was to turn
off disk_ident.enable -- never mind if I was to do that, and then wanted
to set up a *new* disk, how would I do it on the bare device if the disk
identifier can't be accessed?
--
Karl Denninger
karl at denninger.net <mailto:karl at denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4897 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20190514/b1bbfbd1/attachment.bin>
More information about the freebsd-questions
mailing list