dump -L on large filesystems + shutdown

Mon Sep 10 16:40:44 PDT 2007

This weekend I had a very interesting experience with gstripe(8) on
RELENG_6 on amd64.  Details of my setup: machine has 4 disks, connected
to a standard SATA300 controller (nForce 4 chipset):

ad4: 476940MB <WDC WD5000AAKS-00TMA0 12.01C01> at ata2-master SATA300
ad6: 476940MB <WDC WD5000AAKS-00TMA0 12.01C01> at ata3-master SATA300
ad8: 190782MB <WDC WD2000JD-00HBB0 08.02D08> at ata4-master SATA150
ad10: 476940MB <Seagate ST3500630AS 3.AAE> at ata5-master SATA300

/dev/ad8s1a           507630    66956    400064    14%    /
/dev/ad8s1d         16244334    87212  14857576     1%    /var
/dev/ad8s1e          4058062     1778   3731640     0%    /tmp
/dev/ad8s1f         32494668  2335866  27559230     8%    /usr
/dev/ad8s1g        127763620     6422 117536110     0%    /home
/dev/stripe/st0a   946030390 71642044 798705916     8%    /storage
/dev/ad10s1d       473009638 70446308 364722560    16%    /backups

 ad4 = drive #1 in gstripe set (makes /dev/stripe/st0)
 ad6 = drive #2 in gstripe set (makes /dev/stripe/st0)
 ad8 = boot/OS drive
ad10 = drive used for periodic backups (dump(8) dumps to this disk)

All filesystems, except /, have softupdates enabled.  I did not pick
custom block sizes when newfs'ing /storage and /backups.

I have a set of automated backups which run at 02:45 every day.  Full
level 0 backups are on Sunday, and increments 1-6 are Mon-Sat.

Backups are done using the following command set:

  /sbin/dump -{level} -a -h0 -u -C16 -L -f- /backups/foo.{level}.dump

The incident I'm about to describe happened on Sunday.  I was dealing
with an unrelated issue (some Ethernet problems), and I had to reboot
the FreeBSD box in the process.  I rebooted it using reboot(8).  This
was around 03:05 -- in the middle of the backups.

The first thing I noticed was that the ATA "flush-to-disk" stuff was
taking a long time to hit repetitions of zero (that is: 4 4 4 3 4 2 2 1
1 1 0 0 0).  After a few seconds, I saw "0 1 0 1 0 1" start flying by on
the screen over and over at a very fast rate, and after a few more
seconds, I saw the system say "Giving up..." or something like that.
Then it reboot.

When the machine came back up, every filesystem on every disk was
marked dirty.

fsck(8) ran in the background, but took an *incredible* amount of time
to complete on /dev/stripe/st0a (the gstripe set).  "Incredible" means
at least an hour, maybe more.  I was running gstat during that time, and
the gstripe set was pretty much at 100% utilisation, split 50/50 between
ad4 and ad6; nothing odd there.

The reason I'm mailing -stable about this is because it seems there may
be some sort of "deadlock" condition which can happen when using dump -L
on a system and then shutting it down.  Maybe all of this points back to
the ATA subsystem and how long it'll wait for buffers to be flushed to
disk before actually shutting down.  In my case, it obviously did not
wait long enough.  There don't seem to be any tunables for how long to
continue trying/waiting either.

-- 
| Jeremy Chadwick                                    jdc at parodius.com |
| Parodius Networking                           http://www.parodius.com/ |
| UNIX Systems Administrator                      Mountain View, CA, USA |
| Making life hard for others since 1977.                  PGP: 4BD6C0CB |