Problems Terminating zpool scrub...

Jeremy Chadwick freebsd at jdc.parodius.com
Tue Apr 26 13:49:07 UTC 2011


On Tue, Apr 26, 2011 at 02:25:00PM +0100, Conall O'Brien wrote:
> On 26 April 2011 13:15, ambrosehuang ambrose <ambrosehua at gmail.com> wrote:
> > Could you post your PR number?I was curious about the driver used by
> > West Digital Disk, cause I use
> > the WR10EARS?
> 
> http://www.freebsd.org/cgi/query-pr.cgi?pr=156647
> 
> I chalked it up to the SATA controller, since only 2 of my 5 identical
> WD20EARS disks were reporting DMA issues.
> 
> >
> > 2011/4/25 Conall O'Brien <conall at conall.net>
> >>
> >> On 15 April 2011 15:59, Conall O'Brien <conall at conall.net> wrote:
> >> > Hello,
> >> >
> >> >
> >> > I've got a NAS box running 8-STABLEW [1] which I'm running with 5x
> >> > Western Digital 2TB disks.
> >> >
> >> >
> >> > One of the disks was having DMA issues as reported in dmesg, so I
> >> > began the usual zfs workflow of "zpool offline pool dev", physically
> >> > removing it and tried to "zpool replace pool dev" but my attempts to
> >> > do so fail, actually the zpool command keeps ending up in
> >> > uninterruptable wait (the D state). Before resorting to replacing the
> >> > disk, a zpool scrub was in progress. Now, I can't kill it using "zpool
> >> > scrub -s pool", it too ends up in the D state.
> >> >
> >> >
> >> > Is there another way than "zpool scrub -s pool" to terminate a scrub
> >> > process, so I can proceed with the disk replacement. I care more about
> >> > resilvering my pool before getting around to scrubbing it.
> >> >
> >> >
> >> > Thanks!
> >> >
> >> >
> >> > [1] For completeness, uname -a reports FreeBSD galvatron.taku.ie
> >> > 8.2-STABLE FreeBSD 8.2-STABLE #1: Sat Mar 19 13:18:46 UTC 2011
> >> > root at galvatron.taku.ie:/usr/src/obj/usr/src/sys/GALVATRON ??amd64
> >>
> >> I worked out the problem. There's a regression in one of the drivers
> >> between the kernel I was running and my previous kernel:
> >>
> >> FreeBSD galvatron.taku.ie 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #0:
> >> Wed Dec 29 04:00:27 UTC 2010
> >> root at galvatron.taku.ie:/usr/src/obj/usr/src/sys/GALVATRON ??amd64
> >>
> >>
> >> I'll file a PR to get it fixed.

The PR is extremely terse/sub-part quality.  There isn't actual evidence
of the problem being a driver regression.  What needs to be provided in
the PR:

- Relevant dmesg output (pertaining to ataX and adX devices and anything
  else seen around that time; stuff from /var/adm/messages might be more
  useful since it contains timestamps)
- Full dmesg seen during a fresh reboot
- vmstat -i
- atacontrol cap ataX (for each ataX channel.  You can XXX out the
  serial number if desired)
- smartctl -a /dev/adX (for each disk, be sure to label which disk
  is associated with what data.  You can XXX out the serial number if
  desired)

What really needs to be shown are the actual errors themselves, and in
sequential order / with timestamps.  "DMA errors" is too vague; I want
to assume READ_DMA48 but I cannot assume that.

Next:

I'm not sure if your system support its, but can you run the controller
in AHCI mode (BIOS setting) and load ahci.ko instead (ahci_load="yes" in
/boot/loader.conf, your disks will change to /dev/adaX)?  If so, this
would allow you to narrow down whether or not the issue is truly a
driver problem.  You should try this *before* attempting the below.

Next:

Try updating your source to something newer than March 19th.  There have
been ata(4) changes since then that might pertain to your issue.  If the
same issue happens on a present-day build of RELENG_8 then we can start
by trying to narrow it down to commits between, roughly, late December
2010 to mid-March 2011.  Since you follow RELENG_8, you will need to
follow commits.  src/sys/dev/ata is what's relevant here, as well as the
chipsets/ directory under that.

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/ata/
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/ata/chipsets/

Let's get this figured out before other users start correlating their
problems with whatever this is.

-- 
| Jeremy Chadwick                                   jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |



More information about the freebsd-fs mailing list