Trouble with snapshots

Cyrus Rahman crahman at gmail.com
Tue Apr 1 13:15:56 PDT 2008


I'm seeing serious problems with snapshot deadlocks on 7.0-RELEASE
right now.  I haven't been able to set up a test environment to really
determine precise details, but this much I know:  Filesystem i/o will
eventually lock up, requiring a hard reset, after the snapshot mount
sleeps permanently on suspfs.  Eventually there's a cascade and
everything ends up waiting on suspfs.  Running a 'sync' after mount
hangs is a sure way to propagate the problem.  This happens very often
- probably 15% probability per snapshot on the server running 7.0.
It's bad enough so that it's not realistic to use snapshots there.
Other strange things have been observed, in that an entire day's worth
of work vanished - after the reset/reboot the filesystems were consistent,
but in the state they were in many hours before, at the time the snapshot
hung.  The snapshot had been observed hanging, but everything else seemed
to work so a decision was made to reboot at the end of the day - with
disastrous effect!  During the day nothing unusual except for the hung
snapshot was noticed.  I'm guessing everything just got cached (for
hours!) and the cache never got flushed.

This is happening on a system set up with journaled ufs filesystems,
so that may be part of the problem.  The system is running amd64 with
an Intel Q6600.

The filesystem that has trouble with this has a number of
large files, about 500-700Mb on it.  Filesystems with only small files
do not seem to have trouble, even though they are bigger filesystems
with more files.  I can't think of anything else unique about it.


More information about the freebsd-fs mailing list