ZFS "zpool replace" problems

Gerrit Kühn gerrit at pmp.uni-hannover.de
Tue Jan 26 15:03:23 UTC 2010


On Tue, 26 Jan 2010 06:30:21 -0800 Jeremy Chadwick
<freebsd at jdc.parodius.com> wrote about Re: ZFS "zpool replace" problems:

JC> I'm removing the In-Reply-To mail headers for this thread, as you've
JC> now hijacked it for a different purpose.  Please don't do this; start
JC> a new thread altogether.  :-)

Thanks. You're perfectly right, I should have done that.

JC> I'm not sure how the above is supposed to work (I haven't personally
JC> tried it), but:
JC> 
JC> 1) Why didn't you offline the ad10 disk first?
JC>    zpool offline tank ad10

Well, probably because I thought that zfs would simply handle the
situation. I just wanted to replace drive A with drive B, so this was
quite straight-forward for me.

JC> 2) How did you attach ad18?  Did you tell the system about it using
JC>    atacontrol?  If so, what commands did you use?

Yes. The drives did not appear automatically (verified with atacontrol
list). Then I first tried reinit ata9, but that did not work out, so I did
a detach/attach for ata9, then the drive was there (with list and also
the device node appeared).

JC> 3) Can you please provide uname -a output, as well as relevant dmesg
JC>    output to show what kind of SATA controller you have, what's
JC>    attached to what, etc.?

Of course (dmesg is not there anymore, I use pciconf -vl and
atacontrol instead):

ATA channel 0:
    Master:      no device present
    Slave:  acd0 <Optiarc DVD RW AD-7540A/1.01> ATA/ATAPI revision 0
ATA channel 1:
    Master:      no device present
    Slave:       no device present
ATA channel 2:
    Master:  ad4 <ST380815AS/3.AAC> SATA revision 2.x
    Slave:       no device present
ATA channel 3:
    Master:  ad6 <ST380815AS/3.AAC> SATA revision 2.x
    Slave:       no device present
ATA channel 4:
    Master:  ad8 <WDC WD1000FYPS-01ZKB0/02.01B01> SATA revision 2.x
    Slave:       no device present
ATA channel 5:
    Master: ad10 <WDC WD1000FYPS-01ZKB0/02.01B01> SATA revision 2.x
    Slave:       no device present
ATA channel 6:
    Master: ad12 <WDC WD1000FYPS-01ZKB0/02.01B01> SATA revision 2.x
    Slave:       no device present
ATA channel 7:
    Master: ad14 <WDC WD1000FYPS-01ZKB0/02.01B01> SATA revision 2.x
    Slave:       no device present
ATA channel 8:
    Master:      no device present
    Slave:       no device present
ATA channel 9:
    Master:      no device present
    Slave:       no device present


FreeBSD mclane.rt.aei.uni-hannover.de 7.2-STABLE FreeBSD 7.2-STABLE #0:
Mon Sep  7 11:01:56 CEST 2009
root at mclane.rt.aei.uni-hannover.de:/usr/obj/usr/src/sys/MCLANE.72  amd64

The first six drives (up to ad14) are connected onboard (Supermicro dual
opteron board with mcp55):

atapci1 at pci0:0:5:0:     class=0x010485 card=0x161115d9 chip=0x037f10de
rev=0xa3 hdr=0x00 vendor     = 'Nvidia Corp'
    device     = 'MCP55 SATA/RAID Controller (MCP55S)'
    class      = mass storage
    subclass   = RAID
atapci2 at pci0:0:5:1:     class=0x010485 card=0x161115d9 chip=0x037f10de
rev=0xa3 hdr=0x00 vendor     = 'Nvidia Corp'
    device     = 'MCP55 SATA/RAID Controller (MCP55S)'
    class      = mass storage
    subclass   = RAID
atapci3 at pci0:0:5:2:     class=0x010485 card=0x161115d9 chip=0x037f10de
rev=0xa3 hdr=0x00 vendor     = 'Nvidia Corp'
    device     = 'MCP55 SATA/RAID Controller (MCP55S)'
    class      = mass storage
    subclass   = RAID

The other two (ad16 and ad18, the chassis has 8 slots and the last two
were only intended to be used in situtations like the one I have now) are
connected to an extra pci card:

atapci4 at pci0:3:6:0:     class=0x010401 card=0x02409005 chip=0x02401095
rev=0x02 hdr=0x00 vendor     = 'Silicon Image Inc (Was: CMD Technology
Inc)' device     = 'SATA/Raid controller(2XSATA150) (SIL3112)'
    class      = mass storage
    subclass   = RAID

Meanwhile I took out the ad18 drive again and tried to use a different
drive. But that was listed as "UNAVAIL" with corrupted data by zfs.
Probably it already branded the disk for resilvering and is looking for
exactly this one now. I also put in the disk which caused the problem
above again. The resilvering process started again, but very soon the
drive got detached again resulting in the same situation I described above.

Any help is greatly appreciated.


cu
  Gerrit


More information about the freebsd-stable mailing list