(Yet Another) Damaged directory on ZFS?

Peter Maloney peter.maloney at brockmann-consult.de
Sun Dec 4 16:34:36 UTC 2011


Am 25.11.2011 00:08, schrieb Andrew Reilly:
> On Tue, Nov 01, 2011 at 03:41:18AM +0000, John wrote:
>> Hi Folks,
>>
>>    We have a zfs fileserver running 9.0-RC1 which appears to have locked
>> up in a manner similar to the "Damanged directory on ZFS" thread.
>>
>>    It started after a zfs snapshat which appears to have hung. At
>> that point, an "ls" command on a directory which is either empty
>> or contains only other directories works correctly. An "ls" on
>> a directory containing a file will hang.
> Just want to add a sort-of "me too", to this question, in the
> hope that I will learn something useful.
>
> For a couple of weeks I've been seeing messages in
> my daily security reports along the lines of: find:
> /usr/src/.zfs/snapshot: bad file descriptor
>
> I can get the same message now by just cd'ing into one of the
> two "broken" .zfs directories and comparing ls -F output with a
> version that doesn't check the inode data:
>
> ls: snapshot: Bad file descriptor
> shares/
>
> "zfs list -r -t snapshot" shows all of the snapshots created by
> my nightly backups to still exist, seemingly.
>
> A zpool scrub tank did not change or make them go away.
>
> No processes appear to be stuck, wedged or otherwise broken.
>
> FWIW I'm running:
> FreeBSD johnny.reilly.home 9.0-RC1 FreeBSD 9.0-RC1 #4: Sat Nov  5 14:52:15 EST 2011     root at johnny.reilly.home:/usr/obj/usr/src/sys/GENERIC  amd64
>
> Cheers,
>
I have a similar problem, but the lines in my log are clearly from nfs.

Here is one (which shows up when doing a HUP on nfsd):
Oct 31 14:55:39 bcnas1 mountd[47733]: can't delete exports for
/tank/fs1/.zfs/snapshot/daily-2011-10-06T09:27:52: Invalid argument

I don't have any logs old enough to have it, but I also vaguely remember
the words "bad file descriptor".

So I decided to set:
zfs set snapdir=hidden
and now the problem is gone. nfsd no longer knows there is a .zfs
directory, so it does not try to scan it and export it.

However, if I manually type in "ls.zfs/snapshot" on the NFS client, it
does indeed hang the dataset, but not the whole system, and on a
"shutdown -r now" the system hangs due to a process failing to stop.
Then when rebooting, there is no lost data. Everything runs normally
again until another "ls .zfs/snapshot" is run.

Long ago, still seeing the errors in the log, I could list in the .zfs
directory without a hang, but the snapshots looked all wrong; they would
look like a different directory rather than the root of the tree, or
they would not be directories, but some kind of binary file. The hanging
started after more snapshots were created.

Does this match your experience at all? Can you show us the full lines
from the log? (see how my log line says mountd, which gives us a hint
that the "can't delete ..." message does not)


More information about the freebsd-fs mailing list