AHCI driver and static device names

Thu Dec 15 05:24:20 UTC 2011

On 12/14/11 8:05 PM, CyberLeo Kitsana wrote:
>>>> The other option seems to be to use tunefs or a partitioning tool to
>>>> label each partition, which is even more ugly imo.
>>>
>>> Ugly how? Labels appear a lot more semantically elegant than the opaque
>>> 'ada4s1a' moniker.
>>
>> Ugly in that the driver has created a situation where we need
>> workarounds to perform the tasks we need.  *nix systems have always
>> relied upon static device nodes, and using dynamic names without
>> updating the relating tools/methods is ugly.  The workarounds also could
>> fail if someone forgets to perform them (specifically labels), since
>> it's not necessary on just about any other *nix system.  It's perfectly
>> within reason to assume people will forget to apply a label when
>> replacing a disk.
>
> Anything fails if you forget to do it. Administrative failure should not
> be confused with technical failure.

When you're changing a paradigm that is known to administrators for 
decades, it's unreasonable not to expect a decent degree of failure. 
Especially when the reason for the technical change isn't clear and the 
new method isn't at all like the old (ie no disk is guaranteed to get 
the same id).

> Static device nodes are appropriate when the topology is fixed and can
> be reasonably anticipated. With variable topologies, such as USB, iSCSI,
> multipath, and PCI hotswap, the disk controllers may not even exist at
> boot, or may be reordered based on probe order, or the order in which
> the remote units respond; and that's before the kernel even gets around
> to setting up the devices attached to those controllers. You cannot
> reasonably expect the system to statically allocate device nodes for
> every possible configuration that may exist for all technologies that
> might be added to a machine, so why offer the expectation when the
> system cannot possibly hope to fulfill it for even a fraction of the
> common cases?

I grant you variable topologies makes things incredibly hairy, but 
there's no need to take that mess and inject it into how the fixed 
topology (the physical hw in the box) is handled.  Trying to handle all 
topology types in a single space can be messy.  This problem wouldn't 
exist if a fixed topology used the old naming (adXX) and the variable 
topologies used the new naming (adaXX).  Even this is less than ideal 
because your variable topologies provide no guarantee of anything being 
the same, thus your system could boot 1 day and fail the next because 
someone added a new piece of hardware to the network.  That's probably 
more the name of the game in variable topologies (adminA changes the 
configuration on $ImportantBootDevice and stuff breaks), but I certainly 
don't want that uncertainty with the hardware in a machine.

I stated that updating the device naming w/o updating the methodologies 
that rely upon that device naming is asking for trouble.  I can't say I 
know a solution nor that I'm an expert, but this seems like it will 
cause many more problems than it will solve.

>> Case in point.  I have a system with 15 drives in it.  I decided I
>> wanted to install on the 2nd device instead of the 1st, but I
>> partitioned all the other 14 drives.  I completed installation and when
>> to boot the system and it failed.  Stupid me, the GPT boot loader found
>> disk1 with a partitioning scheme but no fs.  So, I popped out disk 1 and
>> when to boot again.  Hey, now it starts to boot only to fail to find the
>> root fs because it's looking on ada1 and the fs is on ada0.  That is a
>> mess.
>
> Sounds like a bug in the BIOS or boot loader. The boot loader should be
> able to ask the BIOS for the device from which it read the boot code,
> and use that instead of just naively using the the first available
> device in the system; the only instances where I've seen this fail have
> been on machines that should've been put down years ago. Which isn't to
> say it doesn't still happen.

No bug in the BIOS at all.  It's simply a case of device boot order, and 
being that I installed on disk 2 but put a bootloader on disk 1 with no 
OS the result was expected.

>> This is not necessarily common, but also not uncommon.  More likely is
>> the case where you add a drive to the system and the above scenario
>> plays out because the device names get re-ordered.  I'm not sure the
>> problem the dynamic device nodes intends to solve, but it's certainly
>> caused all sorts of pain and the need for the 2 (that I know of)
>> workarounds.
>
> How about when you add a PATA drive to a machine, but the cable is
> blocking the last available bay; so you have to move an existing drive
> to a different position on the cable to make room for the one you're
> installing? Static device numbering won't save you now.

This is not the same thing at all.  If I move a physical cable, or a 
drive on a cable, then yes I should expect things to change.  I have 
made a physical change to the disk's connections, and I should expect 
something to come out of it.

In my case, I have not moved the cabling of a disk at all and thus 
expect the device name to stay the same.  All I have done is add a new 
disk to the controller.  I have a reasonable expectation that that 
action should not re-order the device nodes and screw up god knows what 
(ALL mounts could break, and the system could even fail to boot).  This 
is how things used to work, and in fact still do work in Linux and other 
*nix.

> Or how about those silly BIOSes that assume that you must really want to
> boot to the new disk you just attached to the machine, so helpfully
> rearrange your boot order for you so now you're booting to a strange
> disk with who knows what on it?
>
> Honestly, there's so much that can go wrong. That's what sysadmins are for.

None of those are related to my point.  If a something breaks before I 
boot the system, that's a whole other issue.  I am talking about 
breaking filesystem mounting by changing an age old methodology.

>> I dislike the idea of having to use labels to get static functionality
>> (increases the likelihood of something going wrong for a disk replace
>> operation if I forget to label), but I'll give gpt labels a try.
>
> I find that labels solve more problems than they introduce, when applied
> properly. The semantic meaning given to the devices often mean I can
> discover what's on a particular disk in my pile'o'drives just by
> plugging it in and looking at the kernel log; no mounting necessary.
> Likewise, when juggling disks or controllers around, I don't have to
> worry about remembering to update the fstab, since the labels follow the
> data.

If you want to use labels then by all means use them.  I can seem 
advantages to using them.  What I'm saying is that it is broken to have 
to use them in order to fix issues with the ahci driver using dynamic 
device names.  The fact that you have to use them to ensure your system 
doesn't break horribly when you do something simple like add a disk is a 
clear indication of a broken design in the ahci driver imho.

Rob