System deadlock when using mksnap_ffs
Jeremy Chadwick
koitsu at FreeBSD.org
Wed Nov 12 22:05:24 PST 2008
On Wed, Nov 12, 2008 at 09:02:50PM -0800, David Wolfskill wrote:
> On Wed, Nov 12, 2008 at 08:42:00PM -0800, Jeremy Chadwick wrote:
> > ...
> > > > On Wed, Nov 12, 2008 at 05:58:26PM +0000, Tim Bishop wrote:
> > > > > I've been playing around with snapshots lately but I've got a problem on
> > > > > one of my servers running 7-STABLE amd64:
> > > > >
> > > > > FreeBSD paladin 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #8: Mon Nov 10 20:49:51 GMT 2008 tdb at paladin:/usr/obj/usr/src/sys/PALADIN amd64
> > > > >
> > > > > I run the mksnap_ffs command to take the snapshot and some time later
> > > > > the system completely freezes up:
> > > > >
> > > > > paladin# cd /u2/.snap/
> > > > > paladin# mksnap_ffs /u2 test.1
> > > > >
> > > > > It only happens on this one filesystem, though, which might be to do
> > > > > with its size. It's not over the 2TB marker, but it's pretty close. It's
> > > > > also backed by a hardware RAID system, although a smaller filesystem on
> > > > > the same RAID has no issues.
> > ...
> > Then in my book, the patch didn't fix anything. :-) The system is
> > still "deadlocking"; snapshot generation **should not** wedge the system
> > hard like this.
> >
> > Also, during my own testing, I am always able to use Ctrl-T to get
> > SIGINFO from the running process (mksnap_ffs). That behaviour does not
> > change for me.
> >
> > The rest of the below information is good -- but I'm confused about
> > something: is there anyone out there who can use mksnap_ffs on a
> > filesystem (/usr is a good test source) and NOT experience this
> > deadlocking problem?
>
> I hadn't ever tried until I saw your message. Granted, I'm using a
> smaller file system (I doubt that I have a toital of as much as 2 TB in
> all my machines combined), and I'm running i386, vs. amd64. But it ran
> just fine. I wasn't able to test SIGINFO; it finished before I had a
> chance. (I ran it under time(1); wall clock time was 0.91 sec.)
>
> > Literally *every* FreeBSD box I have root access
> > to suffers from this problem, so I'm a little baffled why we end-users
> > need to keep providing debugging output when it should be easy as pie
> > for a developer to do "dump -0 -L -a -f /path/fs.dump /usr" and watch
> > their system wedge.
>
> Well, I routinely use dump/restore pipelines to copy file systems
> around; never had a problem with it.
>
> > ...
>
> For reference:
>
> freebeast(7.1-P)[9] uname -a
> FreeBSD freebeast.catwhisker.org 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #127: Wed Nov 12 05:16:20 PST 2008 root at freebeast.catwhisker.org:/common/S3/obj/usr/src/sys/FREEBEAST i386
> freebeast(7.1-P)[10] ls -la
> total 4
> drwxrwxr-x 2 root operator 512 Nov 12 20:53 .
> drwxr-xr-x 14 root wheel 512 Jan 22 2008 ..
> freebeast(7.1-P)[11] /usr/bin/time -l mksnap_ffs /S2/usr test.1
> 0.91 real 0.00 user 0.05 sys
> 976 maximum resident set size
> 3 average shared memory size
> 627 average unshared data size
> 109 average unshared stack size
> 104 page reclaims
> 0 page faults
> 0 swaps
> 1 block input operations
> 230 block output operations
> 0 messages sent
> 0 messages received
> 0 signals received
> 101 voluntary context switches
> 34 involuntary context switches
> freebeast(7.1-P)[12] ls -la
> total 1460
> drwxrwxr-x 2 root operator 512 Nov 12 20:54 .
> drwxr-xr-x 14 root wheel 512 Jan 22 2008 ..
> -r--r----- 1 root operator 2410791056 Nov 12 20:54 test.1
> freebeast(7.1-P)[13]
David, thanks for chiming in. This is exactly what I was
fearing/worried about.
It would be greatly beneficial if we could figure out what triggers the
slowdown for a lot of us, since for others (proof above) mksnap_ffs
behaves as expected.
Since I'm able to reproduce this pretty much everywhere, here's
information:
# df -ki /usr
Filesystem 1024-blocks Used Avail Capacity iused ifree %iused Mounted on
/dev/ad4s1f 163815904 3835274 146875358 3% 254864 20941934 1% /usr
# cd /usr/.snap
# /usr/bin/time -l mksnap_ffs /usr test.1
<after about 20 seconds, hitting Ctrl-T>
load: 1.90 cmd: mksnap_ffs 11719 [wdrain] 0.00u 0.07s 0% 1092k
23.25 real 0.00 user 0.00 sys
135.98 real 0.00 user 0.62 sys
1092 maximum resident set size
4 average shared memory size
1081 average unshared data size
135 average unshared stack size
101 page reclaims
0 page faults
0 swaps
895 block input operations
13444 block output operations
0 messages sent
0 messages received
0 signals received
6433 voluntary context switches
197 involuntary context switches
# ls -l test.1
-r--r----- 1 root operator 173203463240 Nov 12 21:42 test.1
David's filesystem is 2GBs, while mine is 16GB. His snap takes under 1
second, yet mine takes over 2 minutes.
Possibly the large deviation is explained by the amount of space used on
the filesystem or the number of inodes in use?
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
More information about the freebsd-stable
mailing list