zcolli (zcollide) state, what does znode dying means?

Attila Nagy bra at fsn.hu
Wed Sep 22 10:38:16 UTC 2010


  Hello,

I have a machine, which is heavily hammered with file system operations, 
running a very recent 8-STABLE.
The symptom is that everything works fine for a few minutes, then a lot 
of processes get into zcolli state (according to top). At that there 
there are two outcomes:
1. the disks calm down for a while (for long seconds, there is no, or 
very small amount of IO, verified with gstat), top shows nearly 100% 
system, a lot of processes are on the run queue (load is in the sky, 
around 300 and 1000), all operations stop, top refreshes, but I can't 
really execute new programs, then suddenly the zcolli states change and 
the IO resumes and the run queue decreases.
2. the system remains in this state, after 5-10 minutes there is still 
no change, only a reset helps (doesn't even react to CTRL-ALT-DEL, but 
running programs, like top still refreshes, but no disk IO can be made)

zcollide state only appears here:
http://fxr.watson.org/fxr/source/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c#L915
which says this is due to a dying znode.
My question is: what does a dying znode mean? I don't think it's related 
to the on-disk structure, because the disks seem to be healthy, respond 
quickly (or at least evenly slow, due to the load, I can't see a disk, 
which would have a read error, or slow responses).

Having slowdowns due to this is bad, but having lockups is a lot more 
worse...

Thanks,


More information about the freebsd-fs mailing list