FS hang when creating snapshots on a UFS SU+J setup

Yamagi Burmeister lists at yamagi.org
Wed Jan 11 09:30:55 UTC 2012

I've done some tests to verify that the problem only occures when SU+J
is used, but not SU without J. In fact, I did run the following two
loops on different TTYs in parallel:

while 1
 cp -r /usr/src /root
 rm -Rf /root/src

while 1
 mksnap_ffs / /.snap/snap
 rm -f /.snap/snap

With SU without J the system survives this for at least 1 hour. But as
soon as SU+J is used it most likely deadlocks or even panics in the
first 1 or 2 minutes. What extactly happens seems to vary... In most
cases the system just deadlocks, sometimes like alain at bsdgate.org
descripes and sometimes it's completely unresponsive to any input. 
I've seen kernel messages like "fsync: giving up on dirty".

Several times the system paniced. In most cases printing the generic
"panic: page fault while in kernel mode" and one time printing 
"panic: snapacct_ufs2: bad block". I've never seen the same
backtrace twice. One time the system suddenly rebooted, like a tripple
fault or something like that happend.

Since it's much more likely that the problems described above arrise
when the the filesystem is loaded (for example by the first loop) while
taking the snapshot this looks like some kind of race condition or
something like that. 

Some more information from an older debug session can be found at:

On Tue, 10 Jan 2012 10:30:13 -0800
Kirk McKusick <mckusick at mckusick.com> wrote:

> > Date: Mon, 9 Jan 2012 18:30:51 +0100
> > From: Yamagi Burmeister <lists at yamagi.org>
> > To: jeff at freebsd.org, mckusick at freebsd.org
> > Cc: freebsd-current at freebsd.org, bryce at bryce.net
> > Subject: Re: FS hang when creating snapshots on a UFS SU+J setup
> > 
> > Hello,
> > 
> > I'm sorry to bother you, but you may not be aware of this thread and
> > this problem. We are several people experiencing deadlocks, kernel
> > panics and other problems when creating sanpshots on file systems
> > with SU+J. It would be nice to get some feedback, e.g. how can we
> > help debugging and / or fixing this problem.
> > 
> > Thank you,
> > Yamagi
> First step in debugging is to find out if the problem is SU+J
> specific. To find out, turn off SU+J but leave SU. This change
> is done by running:
> 	umount <filesystem>
> 	tunefs -j disable <filesystem>
> 	mount <filesystem>
> 	cd <filesystem>
> 	rm .sujournal
> You may want to run `fsck -f' on the filesystem while you have
> it unmounted just to be sure that it is clean. Then run your
> snapshot request to see if it still fails. If it works, then
> we have narrowed the problem down to something related to SU+J.
> If it fails then we have a broader issue to deal with.
> If you wish to go back to using SU+J after the test, you can
> reenable SU+J by running:
> 	umount <filesystem>
> 	tunefs -j enable <filesystem>
> 	mount <filesystem>
> When responding to me, it is best to use my <mckusick at mckusick.com>
> email as I tend to read it more regularly.
> 	Kirk McKusick

Homepage:  www.yamagi.org
XMPP:      yamagi at yamagi.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-current/attachments/20120111/617e02e7/attachment.pgp

More information about the freebsd-current mailing list