Snapshot problems

Skylar Thompson skylar at cs.earlham.edu
Tue Jun 28 16:47:11 GMT 2005


Xin LI wrote:

>On Sun, Jun 26, 2005 at 01:20:31PM -0500, Skylar Thompson wrote:
>  
>
>>I've discovered a repeatable problem with FreeBSD's UFS2 snapshots. If I
>>create several snapshots, and then do heavy disk I/O on the original
>>filesystem (deletions, creations, simple touches, etc.) I can cause the I/O
>>system to crash. There is no kernel panic, and the machine still answers
>>pings, but no disk I/O occurs. I can replicate this on a dual-processor
>>beige-box system with a Mylex RAID controller and a RAID-5 set, and also on
>>a dual-processor Dell Poweredge 2650 with a PERC 3/i RAID controller and a
>>RAID-5 set and RAID-1 set.  FreeBSD 5.4-RELEASE is installed on both
>>systems, and SMP is enabled as well, with HTT disabled on the Poweredge. I
>>have DDB compiled in, so I can get debug information but I don't know what
>>to look for.
>>    
>>
>
>I think a script that can reliably trigger the "crash" would be helpful.
>  
>

I was using this script to take the snapshots:

#!/bin/sh

if [ -f /var/run/hourly_snap ]; then
        echo "Lock file exists. Exiting...."
        exit 1
else
        HOUR=`date "+%H"`

        touch /var/run/hourly_snap
        for f in / /usr /var /clients; do
                if [ -f $f/snapshots/hourly_snap.$HOUR ]; then
                        rm -f $f/snapshots/hourly_snap.$HOUR
                fi
                mksnap_ffs $f $f/snapshots/hourly_snap.$HOUR;
        done
        rm /var/run/hourly_snap
fi

I ran this once every other hour, so I had 12 snapshots in circulation 
at any given time. The number of snapshots seemed to exacerbate the 
problem; just having one or two around rarely (although sometimes) 
caused a crash.

>What do you mean by "IO system crash", BTW?  I got confused since it does
>not cause kernel panic and stop ping responses.  Do you mean that the
>I/O system was stalled/suspended when there is heavy disk operations?
>  
>
Yes. The kernel still responds and I can get into DDB just fine, but 
there's no disk activity, at least on the affected filesystem. Usually 
it's /usr, which has many used inodes on account of ports and src.

>My guess is that there is some underlying deadlock(s) present.  Would you
>mind compiling WITESS/WITESS_SUPPORT into your kernel and give it a try?
>This will reduce performance, but would also be helpful for picking locking
>bugs.
>
>  
>

Sure. I've got the 2650 booted up with WITNESS support in addition to 
DDB. Where should I go from here?


-- 
-- Skylar Thompson (skylar at cs.earlham.edu)
-- http://www.cs.earlham.edu/~skylar/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20050628/a6df65ea/signature.bin


More information about the freebsd-fs mailing list