Hot-changing a failed HDD with ahci.ko
Jeremy Chadwick
freebsd at jdc.parodius.com
Wed Dec 14 09:26:27 UTC 2011
On Wed, Dec 14, 2011 at 09:29:52AM +0100, Patrick M. Hausen wrote:
> Hi, all,
>
> while most cheap servers with SATA disks are not really hot-plug
> capable, changing a failed disk (either gmirror or zfs) was possible
> without a reboot by executing e.g. if ad4 failed:
>
> atacontrol detach ata2
> <change disks>
> atacontrol attach ata2
>
> What is the proper equivalent for ahci, ada0 and camcontrol?
None is needed: yank the disk, reinsert, wait a few seconds, done.
Validation, with full output, hardware, etc:
http://koitsu.wordpress.com/2010/07/22/freebsd-and-zfs-hot-swapping-sata-disks-with-ahci/
I've made videos to demonstrate this as well, but need to edit them and
upload them.
> Stop unit commands seem not to work with SATA disks, so I
> tried:
>
> <forcefully unplug "broken" disk>
> -> system logs about lost device, so far so good
> <insert new disk>
> camcontrol reset 1
> camcontrol devlist
> -> disk still not there
> camcontrol rescan 1
> -> command hangs
> <login to a second session, system still responsive>
> shutdown -r now
> -> system panics, eventually reboots
Before you yanked the disk, were any non-ZFS filesystems mounted?
This sounds similar to what happens if you were to yank a classic SATA
disk from a non-AHCI system, or under ata(4), without detaching first.
Or, on some systems, when SATA disks are yanked without use of a
hot-swap backplane.
> I can provide details about the panic if someone is interested,
> but maybe there is a proper procedure already, which I simply missed.
>
> System is RELENG_8_2 amd64.
> ahci0: <Intel Cougar Point AHCI SATA controller> port 0xf090-0xf097,0xf080-0xf083,0xf070-0xf077,0xf060-0xf063,0xf020-0xf03f mem 0xfb921000-0xfb9217ff irq 19 at device 31.2 on pci0
> ada0 at ahcich0 bus 0 scbus1 target 0 lun 0
> ada0: <ST31000340NS SN05> ATA-8 SATA 1.x device
> ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
> ada0: Command Queueing enabled
> ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
> ada1 at ahcich1 bus 0 scbus2 target 0 lun 0
> ada1: <ST31000340NS SN05> ATA-8 SATA 1.x device
> ada1: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
> ada1: Command Queueing enabled
> ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
You might try booting RELENG_9 (which has ahci.ko as the default, so no
need to mess about) on a LiveCD or equivalent and attempt the same
thing. I'm left wondering if there's some stuff in RELENG_8 (not a typo
compared to the above RELENG_9 reference) that you do not have in
RELENG_8_2.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |
More information about the freebsd-stable
mailing list