Snapshot problems
Skylar Thompson
skylar at cs.earlham.edu
Tue Jun 28 16:47:11 GMT 2005
Xin LI wrote:
>On Sun, Jun 26, 2005 at 01:20:31PM -0500, Skylar Thompson wrote:
>
>
>>I've discovered a repeatable problem with FreeBSD's UFS2 snapshots. If I
>>create several snapshots, and then do heavy disk I/O on the original
>>filesystem (deletions, creations, simple touches, etc.) I can cause the I/O
>>system to crash. There is no kernel panic, and the machine still answers
>>pings, but no disk I/O occurs. I can replicate this on a dual-processor
>>beige-box system with a Mylex RAID controller and a RAID-5 set, and also on
>>a dual-processor Dell Poweredge 2650 with a PERC 3/i RAID controller and a
>>RAID-5 set and RAID-1 set. FreeBSD 5.4-RELEASE is installed on both
>>systems, and SMP is enabled as well, with HTT disabled on the Poweredge. I
>>have DDB compiled in, so I can get debug information but I don't know what
>>to look for.
>>
>>
>
>I think a script that can reliably trigger the "crash" would be helpful.
>
>
I was using this script to take the snapshots:
#!/bin/sh
if [ -f /var/run/hourly_snap ]; then
echo "Lock file exists. Exiting...."
exit 1
else
HOUR=`date "+%H"`
touch /var/run/hourly_snap
for f in / /usr /var /clients; do
if [ -f $f/snapshots/hourly_snap.$HOUR ]; then
rm -f $f/snapshots/hourly_snap.$HOUR
fi
mksnap_ffs $f $f/snapshots/hourly_snap.$HOUR;
done
rm /var/run/hourly_snap
fi
I ran this once every other hour, so I had 12 snapshots in circulation
at any given time. The number of snapshots seemed to exacerbate the
problem; just having one or two around rarely (although sometimes)
caused a crash.
>What do you mean by "IO system crash", BTW? I got confused since it does
>not cause kernel panic and stop ping responses. Do you mean that the
>I/O system was stalled/suspended when there is heavy disk operations?
>
>
Yes. The kernel still responds and I can get into DDB just fine, but
there's no disk activity, at least on the affected filesystem. Usually
it's /usr, which has many used inodes on account of ports and src.
>My guess is that there is some underlying deadlock(s) present. Would you
>mind compiling WITESS/WITESS_SUPPORT into your kernel and give it a try?
>This will reduce performance, but would also be helpful for picking locking
>bugs.
>
>
>
Sure. I've got the 2650 booted up with WITNESS support in addition to
DDB. Where should I go from here?
--
-- Skylar Thompson (skylar at cs.earlham.edu)
-- http://www.cs.earlham.edu/~skylar/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20050628/a6df65ea/signature.bin
More information about the freebsd-fs
mailing list