Boot device question

Thu Oct 23 08:45:03 PDT 2008

On Thu, Oct 23, 2008 at 08:12:38AM -0700, Chris Pratt wrote:
> I have a server with 6 hot-swap SATA slots. It was delivered
> with the first slot empty and 5 drives set up as /dev/ad4 through
> /dev/ad12. I'd never paid attention to this until I wanted to add
> a 6th, now 4 years later. When I popped it in, I realized the
> empty bay was not 6 but rather bay 1, and of course it wouldn't
> boot. Presumably /dev/ad2 had now come alive for the first time.
> I popped out the disk, rebooted and after it was up, I plugged it
> back in (hot) and ran sysinstall. It didn't see the disk so I couldn't
> fdisk it. No device files existed for it.
>
> I was thinking a right approach would be to change fstab to
> reference ad2 for all the system disk file systems, shutdown,
> move that drive to the first bay and plug the new drive into the
> 2nd bay. This seemed like more of a permanent solution.

This is the solution I go with, because it's obvious and doesn't add
more complexity to the picture.

If the installation was originally done when the disk was considered
ad4, for example, you should still be able to boot that drive (no matter
what port it's on, assuming SATA), choose single-user at the
beastie/loader menu, then make changes to /etc/fstab.  Upon reboot (in
multi-user mode) things should "just work", sans any programs which you
have that might refer to disks by device (e.g.  smartd.conf, etc.)

You can avoid the single-user step if you enjoy living dangerously.

> If those /dev/ad* files are created at boot dynamically,
> this should work. I've found docs that imply that they are
> dynamically discovered and created from FreeBSD 5 forward
> (auto-discovery?). Are they or do I need to create them prior to
> start up.

They are, and it's hard to explain why/how.

The "dynamic" aspect is entirely dependent upon different features/modes
of the ATA configuration though.  For example, a SATA controller
operating in "Legacy/Compatible" mode might show two SATA disks as
ata0-master and ata0-slave (even though they're SATA); the same
controller in "Enhanced" mode might show the disks as ata4-master
and ata5-master; the same controller in AHCI mode might show the disks
as ata8-master and ata10-master.

I think some people deal with this problem using glabel(8), but as I
mentioned, I prefer to do things the old-fashioned way.

> The thing is, there is no easy recovery from failure here since I
> have no console monitor to let me see what's going on or to fix
> fstab if it fails (counter-intuitively, the only place I can access
> the console is from remote locations ;-)), so I just want to know
> if I'm thinking straight?

See bottom of my mail.

> The plan is:
>
> 1. Change /etc/fstab entries for ad4 filesystems to ad2
> 2. Shutdown
> 3. Put the system disk in Bay 1
> 4. Power up
>
> Should it boot?

How certain are you that "bay 1" correlates with ad4?  That's the real
question here.

You obviously have *some* form of access to the machine physically --
or, your co-location provider is offering "remote hands" capability.
This would be the first time I'd *ever* heard of a co-lo offering that
feature without volunteering to put a VGA monitor + keyboard on the
machine so they can see what's going on for you.  (Most providers will
give you "remote hands" for free, as long as the duration of incident
does not exceed 10-15 minutes).

Since these bays are hot-swappable, why don't you have the remote hands
person insert a new disk into the spare/empty bay?

You should be able to run "atacontrol attach <channel>" (where channel
is the ATA channel which has no disk attached to it, see atacontrol
list), and then see what the newly-inserted disk's device name is.  Make
note of it, then do "atacontrol detach <channel>", then have the remote
hands person remove the disk they just installed.  After that, edit
/etc/fstab with the information you just obtained, shutdown -p now,
then have the remote hands person move the OS disk into the spare/empty
bay; that should be sufficient.

All that said:

I strongly urge you to take the time to consider the volatility of your
situation.  You have something that is obviously critical to you, in a
remote location, with no remote way to manage it other than SSH.  The
year is 2008: there are tons of ways to solve this problem.  Your
provider should really offer serial console hookups, KVM-over-IP, or at
bare minimum, their remote hands folks should be permitted to hook up
a keyboard and VGA monitor and have you step them through what to do
over the phone.  Our co-lo provider offers this for free, as long as
the duration of the incident does not take more than 10-15 minutes;
otherwise, it's expensive (hundreds of dollars).

If you're with a co-lo provider who doesn't offer this capability,
consider switching to one who does.  There is absolutely no reason
to accept lack-of remote management in this day and age.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |