misc/169398: Can't remove file with permanent error

Ron Dzierwa RonDzierwa at comcast.net
Mon Jun 25 14:00:21 UTC 2012


>Number:         169398
>Category:       misc
>Synopsis:       Can't remove file with permanent error
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jun 25 14:00:21 UTC 2012
>Closed-Date:
>Last-Modified:
>Originator:     Ron Dzierwa
>Release:        8.2-RELEASE-p6
>Organization:
Innovative Engineering, Inc.
>Environment:
FreeBSD phoenix.hsd1.md.comcast.net 8.2-RELEASE-p6 FreeBSD 8.2-RELEASE-p6 #0: Sat Mar 24 20:42:07 EDT 2012     root at phoenix.hsd1.md.comcast.net:/usr/src/sys/amd64/compile/PHOENIX  amd64

>Description:
I am running ZFS filesystem version 4 and storage pool version 15 on a FreeBSD 8.2-Release-amd64 kernel.  I have a single 12TB pool based on a 3ware 9650 controller with 8 seagate ST2000DL003 drives in a raid-5 configuration managed by the controller.

I recently had a connector problem on a disk in the array while running a performance test that was writing a 1TB pattern file to the array. When the raid controller started reporting errors I stopped the test and re-seated the connector on the drive.  After running a verify on the raid, I tried to read the partial pattern file and ZFS produced copious amounts of checksum error messages on the system console.  So, I rm'ed the file, and got even more checksum errors interspersed with several I/O error 86 messages.  Since the rm, ls no longer shows the file, but I did a scrub just to be sure the bogus file was gone, and got tons of checksum and i/o 86 errors.  At the end, zpool status shows:

phoenix# zpool status -v zfsPool
  pool: zfsPool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed after 3h40m with 6353 errors on Fri Jun 22 08:36:36 2012
config:

        NAME        STATE     READ WRITE CKSUM
        zfsPool     ONLINE       0     0 6.20K
          da0       ONLINE       0     0 12.4K

errors: Permanent errors have been detected in the following files:

        zfsPool/raid:<0x9e241>


I have tried "zpool clear"/reboot/"zpool scrub" several times now, and get a similar set of errors and results. 

My question is - How do I get rid of this file?  It is no longer linked to a directory entry, and there shouldn't be anybody with it open since I have rebooted several times.  yet, zfs still tells me there's a broken file and I should replace it.  It is most likely the pattern test file that I deleted, so I don't need it and I don't want to recover it.  i would just like to get rid of it and get my filesystem clean again without resorting to starting over.


thanks,
ron.


>How-To-Repeat:
not sure.  it occurred because of an untimely combination of high usage and hardware failures.
>Fix:
it was suggested that i either backup or copy the array somewhere and then copy it back, but the machine is in production, and  don't have enough capacity elsewhere to copy the entire content.  Anyway, for a serious filesystem, it should be possible to clean this file even if it has bad links and checksums without starting over.

>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list