Unplugging disk under ZFS yield panic
Jeremy Chadwick
freebsd at jdc.parodius.com
Wed Jan 11 20:40:43 UTC 2012
On Wed, Jan 11, 2012 at 09:07:08PM +0100, Fabian Keil wrote:
> Gergely CZUCZY <phoemix at harmless.hu> wrote:
>
> > I'd like to ask, whether it is normal behaviour when we're unplugging a
> > disk under a ZFS system then on the first write a kernel panic happened.
>
> Sounds familiar. I currently have two PRs open for
> reproducible kernel panics after a vdev gets lost:
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162010
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162036
>
> Note that the pool layouts are different, though.
Is this problem truly ZFS-specific? I'd been tracking this problem for
years, and was told it was fixed:
http://wiki.freebsd.org/BugBusting/Commonly_reported_issues
* Panic occurs when a mounted device (USB, SATA, local image file,
etc.) is removed
Workaround: Be sure to umount all filesystems before removing the
physical device
Partial fix: Committed to CURRENT (8.0) on/prior to 2008/02/21
There is ongoing work to fully fix this problem, ETA 2009/02
OP, please provide a kernel backtrace.
Otherwise, if needed, I can go yank one of the two mirrored disks out of
my FreeBSD box at home to try and reproduce the problem.
pool: data
state: ONLINE
scan: scrub repaired 0 in 1h17m with 0 errors on Thu Dec 29 12:05:05 2011
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada3 ONLINE 0 0 0
cache
ada4 ONLINE 0 0 0
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <WDC WD1002FAEX-00Z3A0 05.01D05> ATA-8 SATA 3.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <WDC WD1002FAEX-00Z3A0 05.01D05> ATA-8 SATA 3.x device
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ahci0: <Intel ICH9 AHCI SATA controller> port 0x1c50-0x1c57,0x1c44-0x1c47,0x1c48-0x1c4f,0x1c40-0x1c43,0x18e0-0x18ff mem 0xdc000800-0xdc000fff irq 17 at device 31.2 on pci0
ahci0: [ITHREAD]
ahci0: AHCI v1.20 with 6 3Gbps ports, Port Multiplier supported
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich1: [ITHREAD]
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich3: [ITHREAD]
> > The hardware is a supermicro X8DTH-i/6/iF/6F board with 2x LSI 2008
> > fusion MPT SAS-2 controllers, over the mps(4) driver. The disks are
> > accessed over gmultipath, and the multipath'd devices are added to a
> > ZFS mirror:
> > DB
> > mirror-0
> > multipath/DB01
> > multipath/DB02
> > mirror-1
> > multipath/DB03
> > multipath/DB04
> > logs
> > mirror/host1p5
> > cache
> > multipath/SSD03p1
> > spares
> > multipath/DB05
> >
> > System is 9.0-RELEASE
> >
> > I've unplugged DB03 and on the first write we got a kernel panic.
> > Should this be normal behaviour or we're missing something here?
>
> Without a back trace or at least the panic reason one can only
> speculate what's going on, but I think it's rather unlikely
> that the panic is the intended behaviour and not a bug.
>
> Maybe you can gather some additional information and file a PR?
>
> > On a device removal we're expecting it to moving to the spare disk, or
> > using the available redundant disks.
>
> I agree that this behaviour would be preferable to a panic.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |
More information about the freebsd-fs
mailing list