Errors on a file on a zpool: How to remove?

Sun Jan 24 00:40:19 UTC 2010

On Sat, 23 Jan 2010, Rich wrote:

> I have no files named 0x0.
>
> I have a number of files which, on attempting to do anything to them
> (stat, mv, rm), EIO occurs, the checksum error number on three of the
> disks in that pool ticks up, and /var/log/messages reports what I
> reported in my initial post. (i discovered this due to FreeBSD's daily
> check-for-setuid-bits-in-strange-places find command reporting EIO on
> some files.)
>
> My original post in this thread is about how to resolve this.

Do these bad files show up on "zpool status -v" after a scrub?

This really sounds much more like an issue of corrupt metadata. ZFS keeps
multiple copies of filesystem metadata even on non-redundant pools (ditto
blocks). You said there was bad ram in this machine at one point, which
may mean that *all* of the metadata was corrupt.

In my encounter with a bad stick of ram, the data was correct but the
stored checksums were wrong. I was able to "recover" the data by simply
changing zfs_read() to not report EIO when it encounters an ECKSUM error
from the zfs layer -- essentially ignoring the checksum error. I have no
idea what this might do if the metadata itself is corrupt, so that could
be risky.

Another option is the zdb solution mentioned earlier.

>
> On Sat, Jan 23, 2010 at 6:34 PM, Wes Morgan <morganw at chemikals.org> wrote:
> > On Sat, 23 Jan 2010, Rich wrote:
> >
> >> On Sat, Jan 23, 2010 at 4:21 PM, Wes Morgan <morganw at chemikals.org> wrote:
> >> > On Sat, 23 Jan 2010, Rich wrote:
> >> >
> >> >> I already diagnosed the bad hardware - one of the two sticks of RAM
> >> >> had gone bad, and fails memtest in the other machine.
> >> >>
> >> >>   pool: rigatoni
> >> >>  state: ONLINE
> >> >> status: One or more devices has experienced an error resulting in data
> >> >>       corruption.  Applications may be affected.
> >> >> action: Restore the file in question if possible.  Otherwise restore the
> >> >>       entire pool from backup.
> >> >>    see: http://www.sun.com/msg/ZFS-8000-8A
> >> >>  scrub: scrub completed after 15h28m with 1 errors on Thu Jan 21 18:09:25 2010
> >> >> config:
> >> >>
> >> >>       NAME        STATE     READ WRITE CKSUM
> >> >>       rigatoni    ONLINE       0     0     1
> >> >>         da4       ONLINE       0     0     2
> >> >>         da5       ONLINE       0     0     2
> >> >>         da7       ONLINE       0     0     0
> >> >>         da6       ONLINE       0     0     0
> >> >>         da2       ONLINE       0     0     2
> >> >>
> >> >> errors: Permanent errors have been detected in the following files:
> >> >>
> >> >>         rigatoni/mirrors:<0x0>
> >> >
> >> > Can you post your entire pool filesystem structure? That message above
> >> > looks like an unreferenced block or corrupted metadata rather than an
> >> > actual file. Also, if it's part of a snapshot, you simply have to destroy
> >> > the snapshot.
> >> >
> >> > I had a pool become corrupted due to bad memory, and all of the files were
> >> > still able to be manipulated. The only time EIO popped up was on the
> >> > specific block that had a checksum error.
> >>
> >> # zfs list -r -t all rigatoni
> >> NAME                  USED  AVAIL  REFER  MOUNTPOINT
> >> rigatoni             5.73T   984G    19K  /rigatoni
> >> rigatoni/logs_bitch   269M   984G   269M  /rigatoni/logs_bitch
> >> rigatoni/mirrors     5.73T   984G  5.73T  /mirrors
> >>
> >> No snapshots here. :/
> >>
> >> EIO only pops up on the files I mentioned above - everything else in
> >> those directories, including renaming that directory, is fine.
> >
> > I must have missed it, what files is it showing besides the <0x0> address?
> > Or do you have a file named "<0x0>"?
>
>
>
>