zpool scrub errors on 3ware 9550SXU
ian j hart
ianjhart at ntlworld.com
Mon Jun 15 07:59:14 UTC 2009
On Monday 15 June 2009 03:12:41 Freddie Cash wrote:
> On Sun, Jun 14, 2009 at 6:27 AM, ian j hart <ianjhart at ntlworld.com> wrote:
> > On Sunday 14 June 2009 09:27:22 Freddie Cash wrote:
> > > On Sat, Jun 13, 2009 at 3:11 PM, ian j hart <ianjhart at ntlworld.com>
> >
> > wrote:
> > > > [long post with long lines, sorry]
> > > >
> > > > I have the following old hardware which I'm trying to make into a
> >
> > storage
> >
> > > > server (back story elided).
> > > >
> > > > Tyan Thunder K8WE with dual Opteron 270
> > > > 8GB REG ECC RAM
> > > > 3ware/AMCC 9550SXU-16 SATA controller
> > > > Adaptec 29160 SCSI card -> Quantum LTO3 tape
> > > > ChenBro case and backplanes.
> > > > 'don't remember' PSU. I do remember paying £98 3 years ago, so not
> >
> > cheap!
> >
> > > > floppy
> > > >
> > > > Some Seagate Barracuda drives. Two old 500GB for the O/S and 14 new
> >
> > 1.5TB
> >
> > > > for
> > > > data (plus some spares).
> > > >
> > > > Astute readers will know that the 1.5TB units have a chequered
> > > > history.
> > > >
> > > > I went to considerable effort to avoid being stuck with a bricked
> > > > unit, so imagine my dismay when, just before I was about to post
> > > > this, I discovered there's a new issue with these drives where they
> > > > reallocate sectors, from new.
> > > >
> > > > I don't want to get sucked into a discussion about whether these
> > > > disks are faulty or not. I want to examine what seems to be a
> > > > regression between 7.2-RELEASE and 8-CURRENT. If you can't resist,
> > > > start a thread
> >
> > in
> >
> > > > chat and CC
> > > > me.
> > > >
> > > > Anyway, here's the full story (from memory I'm afraid).
> > > >
> > > > All disks exported as single drives (no JBOD anymore).
> > > > Install current snapshot on da0 and gmirror with da1, both 500GB
> > > > disks. Create a pool with the 14 1.5TB disks. Raidz2.
> > >
> > > Are you using a single raidz2 vdev using all 14 drives? If so, that's
> > > probably (one of) the source of the issues. You really shouldn't use
> >
> > more
> >
> > > than 8 or 9 drives in a singel raidz vdev. Bad things happen.
> >
> > Especially
> >
> > > during resilvers and scrubs. We learned this the hard way, trying to
> > > replace a drive in a 24-drive raidz2 vdev.
> > >
> > > If possible, try to rebuild the pool using multiple, smaller raidz (1
> > > or
> >
> > 2)
> >
> > > vdevs.
> >
> > Did you post this issue to the list or open a PR?
>
> No, as it's a known issue with ZFS itself, and not just the FreeBSD port.
>
> > This is not listed in zfsknownproblems.
>
> It's listed in the OpenSolaris/Solaris documentation, best practises
> guides, blog posts, and wiki entries.
I have the Administration guide (June 2009). Page 64
...configuration with 14 disks is better split into a (sic) two 7-disk groupings...single-digit groupings of disks should perform better.
This implies it works.
Can you point to the small print, my GoogleFoo is weak.
>
> > Does opensolaris have this issue?
>
> Yes.
Anyway, I broke up the pool into two groups as you suggested.
As usual scrubs cleanly on 7.2. Started throwing errors within a few minutes under 8. Then it paniced, possibly due to scrub -s.
It's sat at the DB prompt if there's anything I can do. I'll need idiots guide level instruction. I have a screen dump if someone want to step up. Off list?
Highlight seems to be...
Memory modified after free 0xffffff0004da0c00(248) val=3000000 @ 0xffffff0004dc00
Panic: most recently used by none
Cheers
--
ian j hart
More information about the freebsd-current
mailing list